This course introduces students to data analysis and visualization in the field of exploratory data science using Python.
At the end of the course, students should be able to:
Introduction: Introduction to Data Science, Exploratory Data Analysis and Data Science Process. Motivation for using Python for Data Analysis, Introduction of Python shell iPython and Jupyter Notebook.
Essential Python Libraries: NumPy, pandas, matplotlib, SciPy, scikit-learn, statsmodels.
Getting Started with Pandas: Arrays and vectorized conputation, Introduction to pandas Data Structures, Essential Functionality, Summarizing and Computing Descriptive Statistics.
Data Loading, Storage and File Formats.
Reading and Writing Data in Text Format, Web Scraping, Binary Data Formats, Interacting with Web APIs, Interacting with Databases
Data Cleaning and Preparation.
Handling Missing Data, Data Transformation, String Manipulation
Data Wrangling: Hierarchical Indexing, Combining and Merging Data Sets Reshaping and Pivoting.
Data Visualization matplotlib: Basics of matplotlib, plotting with pandas and seaborn, other python visualization tools.
Data Aggregation and Group operations: Group by Mechanics, Data aggregation, General split-apply-combine, Pivot tables and cross tabulation
Time Series Data Analysis: Date and Time Data Types and Tools, Time series Basics, date Ranges, Frequencies and Shifting, Time Zone Handling, Periods and Periods Arithmetic, Resampling and Frequency conversion, Moving Window Functions.
Advanced Pandas: Categorical Data, Advanced GroupBy Use, Techniques for Method Chaining.