Week 4 - Data Processing with Pandas
Introduction to Pandas and DataFrames
Selecting, Filtering, and Sorting Data
Joining and Merging DataFrames
Working with Strings and Dates
Assignment: MessyCorp Goes Pandas
Content
Concepts: DataFrame creation, info(), describe(), missing values.
Create a small dataset and explore it.
from io import StringIO
import pandas as pd
csv_data = StringIO(
"""order_id,customer_id,region,amount,order_date
1,100,NL,120,2024-01-02
2,101,BE,90,2024-01-03
3,102,NL,,2024-01-03
4,103,DE,200,2024-01-04
5,100,NL,50,2024-01-05
"""
)
orders = pd.read_csv(csv_data)
Instructions:
orders.info() and orders.describe().amount values with 0.Concepts: Boolean masks, loc, sorting.
Instructions:
region is NL and amount is greater than 80.amount descending.is_big_order where amount >= 150.Concepts: groupby, agg, transform.
Instructions:
region and calculate total revenue and order count.region_avg with the average order amount per region.region_avg is the same for all rows in the same region.Concepts: merge, join types.
Create a customer table:
customers = pd.DataFrame(
{
"customer_id": [100, 101, 102, 103],
"name": ["Alice", "Bob", "Chloe", "Daan"],
"segment": ["retail", "retail", "b2b", "b2b"],
}
)
Instructions:
orders with customers using a left join on customer_id.segment.Concepts: pivot_table, file output.
Instructions:
amount by region and order_date.output/pivot.csv and output/pivot.parquet.The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

Found a mistake or have a suggestion? Let us know in the feedback form.