Introduction to Pandas
DataFrame operations
Grouping and Aggregation
Joining and Merging
Different Data Types
Advanced Transformations
Writing Data
Alternatives to Pandas
Practice
Assignment
Gotchas & Pitfalls
Back to Track
Week 4 - Data Processing with Pandas
Welcome to Week 4! You have learned how to structure code (Week 2) and ingest/validate data (Week 3). Now it's time to process it at scale. This week introduces Pandas, the industry-standard tool for high-performance data manipulation in Python. You will also learn about modern data architectures (ETL vs ELT) and efficient storage formats like Parquet.
By the end of this week, you will be able to load complex datasets, transform them efficiently using vectorized operations, and describe the architectural trade-offs between traditional ETL and modern ELT pipelines.
Learning goals
- Master the Pandas library for tabular data manipulation (DataFrames and Series)
- Select, filter, and sort data efficiently using loc, iloc, and boolean indexing
- Perform grouping and aggregation operations to summarize data by categories
- Join and merge multiple DataFrames using different join types (inner, outer, left, right)
- Clean and transform text data using string operations and pattern matching
- Work with datetime data: parsing, extracting components, and time-based calculations
- Apply advanced transformations: pivoting, melting, window functions, and vectorized operations
- Replace slow Python loops with high-performance vectorized operations
- Handle data quality issues (missing values, duplicates) within DataFrames
- Export processed data to CSV, Parquet, and SQLite databases
Back to Data Track

*https://hackyourfuture.net/*
Found a mistake or have a suggestion? Let us know in the feedback form.