Data Analysis and Visualization

Course ID
B.Sc. CS (Hons.)
Paper Type
DSE - 1
Lecture & Practical

Unique Paper Code: Update Awaited

This course introduces students to data analysis and visualization in the field of exploratory data science using Python.

Learning Outcomes:

At the end of the course, students should be able to:

  • Use data analysis tools in the pandas library
  • Load, clean, transform, merge and reshape data.
  • Handle external files as well as exceptions.
  • Analyze and manipulate time series data.
  • Solve real world data analysis problems.

Course Contents

Unit 1

Introduction: Introduction to Data Science, Exploratory Data Analysis and Data Science Process. Motivation for using Python for Data Analysis, Introduction of Python shell iPython and Jupyter Notebook.

Essential Python Libraries: NumPy, pandas, matplotlib, SciPy, scikit-learn, statsmodels.


Unit 2

Getting Started with Pandas: Arrays and vectorized conputation, Introduction to pandas Data Structures, Essential Functionality, Summarizing and Computing Descriptive Statistics.
Data Loading, Storage and File Formats.
Reading and Writing Data in Text Format, Web Scraping, Binary Data Formats, Interacting with Web APIs, Interacting with Databases
Data Cleaning and Preparation.
Handling Missing Data, Data Transformation, String Manipulation

Unit 3

Data Wrangling: Hierarchical Indexing, Combining and Merging Data Sets Reshaping and Pivoting.
Data Visualization matplotlib: Basics of matplotlib, plotting with pandas and seaborn, other python visualization tools.

Unit 4

Data Aggregation and Group operations: Group by Mechanics, Data aggregation, General split-apply-combine, Pivot tables and cross tabulation

Time Series Data Analysis: Date and Time Data Types and Tools, Time series Basics, date Ranges, Frequencies and Shifting, Time Zone Handling, Periods and Periods Arithmetic, Resampling and Frequency conversion, Moving Window Functions.

Unit 5

Advanced Pandas: Categorical Data, Advanced GroupBy Use, Techniques for Method Chaining.


Lab List 1

  1. Practicals based on NumPy ndarray
  2. Practicals based on Pandas Data Structures
  3. Practicals based on Data Loading, Storage and File Formatss
  4. Practicals based on Interacting with Web APIs
  5. Practicals based on Data Cleaning and Preparation
  6. Practicals based on Data Wrangling
  7. Practicals based on Data Visualization using matplotlib
  8. Practicals based on Data Aggregation
  9. Practicals based on Time Series Data Analysis

Additional Information

Text Books

McKinney, W.(2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy and IPython. 2nd edition. O’Reilly Media
O’Neil, C., & Schutt, R. (2013). Doing Data Science: Straight Talk from the Frontline
O’Reilly Media

Teaching Learning Process

Use of ICT tools in conjunction with traditional class room teaching methods
Interactive sessions
Class discussions

Assessment Methods

Written tests, assignments, quizzes, presentations as announced by the instructor in the class


