Week 4: Data Processing


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.

Pandas and DataFrames

Selecting and Filtering Data

Grouping and Aggregation

Joining and Merging DataFrames

Working with Strings and Dates

Advanced Transformations

Writing Data

Visualizing Data with Pandas

Alternatives to Pandas

Jupyter Notebooks

Practice

Assignment: MessyCorp Pandas

Gotchas & Pitfalls

Week 4 Kickoff Slides

Career relevance: Week 4

Pandas Cheatsheet

Week 4 Glossary

Going Further: Optional Deep Dives

Week 4: Data Processing

Welcome to Week 4! You have learned how to structure code (Week 2) and ingest and validate data (Week 3). Now it is time to process it at scale. This week introduces Pandas, the industry-standard tool for high-performance data manipulation in Python. You will also learn about modern data architectures and efficient storage formats like Parquet.

By the end of this week, you will be able to load complex datasets, transform them efficiently using vectorized operations, and produce clean, reusable outputs for downstream systems.

Learning goals

Supplementary


First lesson: Introduction to Pandas and DataFrames