Week 4 - Data Processing with Pandas


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.

Introduction to Pandas and DataFrames

Selecting, Filtering, and Sorting Data

Grouping and Aggregation

Joining and Merging DataFrames

Working with Strings and Dates

Advanced Transformations

Writing Data

Visualizing Data with Pandas

Alternatives to Pandas

Practice

Assignment: MessyCorp Goes Pandas

Gotchas & Pitfalls

Lesson Plan

Week 4 - Data Processing with Pandas

Welcome to Week 4! You have learned how to structure code (Week 2) and ingest and validate data (Week 3). Now it is time to process it at scale. This week introduces Pandas, the industry-standard tool for high-performance data manipulation in Python. You will also learn about modern data architectures and efficient storage formats like Parquet.

By the end of this week, you will be able to load complex datasets, transform them efficiently using vectorized operations, and produce clean, reusable outputs for downstream systems.

Learning goals


First lesson: Introduction to Pandas and DataFrames

Lesson plan