Week 4 - Data Processing with Pandas

Introduction to Pandas and DataFrames

Selecting, Filtering, and Sorting Data

Grouping and Aggregation

Joining and Merging DataFrames

Working with Strings and Dates

Advanced Transformations

Writing Data

Visualizing Data with Pandas

Alternatives to Pandas

Practice

Assignment: MessyCorp Goes Pandas

Gotchas & Pitfalls

Lesson Plan

🗓️ Lesson Plan

This week is about making data usable. Students should leave with confidence in selecting, grouping, and reshaping data with Pandas, and they should understand how those transformations map to real reporting tasks.

Goals

By the end of this lesson, students should be able to:

Inspect* a DataFrame and spot data quality issues quickly.

Filter and sort* data with loc, iloc, and boolean masks.

Group and aggregate* to produce summary metrics.

Join* two datasets without duplicating or losing rows.

Visualize* a quick trend plot for sanity checks.

Export* results to CSV or Parquet.

Schedule

Time Activity Duration
0:00 Welcome and warm-up 5 min
0:05 Live demo: DataFrame basics 15 min
0:20 Filtering and sorting mini-lab 20 min
0:40 Break 10 min
0:50 Groupby and aggregation workshop 25 min
1:15 Joins and merges demo 15 min
1:30 Writing outputs and Azure context 10 min
1:40 Assignment walkthrough 15 min
1:55 Q&A and wrap-up 5 min
2:00 End -

Total: 2 hours


Live Demo: DataFrame Basics

Goal: Show how fast Pandas can make sense of messy data.

  1. Load a small CSV.
  2. Run info() and describe().
  3. Fix a column type and handle a missing value.
  4. Plot a quick line chart to validate a trend.

Mini-Lab: Filtering and Sorting

Goal: Build confidence with loc, iloc, and boolean masks.

  1. Filter to a region.
  2. Filter to a price range.
  3. Sort by amount and add a rank column.

Workshop: Groupby and Aggregation

Goal: Produce a real report table.

  1. Group by region and calculate total revenue.
  2. Add average order value.
  3. Explain the difference between count and size.

Demo: Joins and Merges

Goal: Show how bad joins create wrong numbers.

  1. Join customers to orders.
  2. Show the effect of duplicate keys.
  3. Use indicator=True to find unmatched rows.

Writing Outputs and Azure Context

Goal: Explain why format and storage matter.

  1. Write to CSV and Parquet.
  2. Discuss why Parquet is better for analytics.
  3. Briefly show how the file could be uploaded to Azure Blob Storage.

Assignment Walkthrough

Goal: Connect the chapters to the assignment tasks.


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.