Week 4 - Data Processing with Pandas
Introduction to Pandas and DataFrames
Selecting, Filtering, and Sorting Data
Joining and Merging DataFrames
Working with Strings and Dates
Assignment: MessyCorp Goes Pandas
Real datasets almost never live in a single table. You will combine customers, orders, products, and payments all the time. If you join incorrectly, your numbers are wrong.
merged = orders.merge(customers, on="customer_id", how="left")
<aside> 🤓 Curious Geek: Join diagrams
Many tutorials use Venn diagrams for joins. They are a helpful metaphor, but real joins are based on keys, not sets.
</aside>
merged = orders.merge(
prices,
on=["product_id", "price_date"],
how="left",
)
indicatormerged = orders.merge(customers, on="customer_id", how="left", indicator=True)
merged["_merge"].value_counts()
_merge helps you spot rows that failed to match.
If your key is the index, use join.
customers = customers.set_index("customer_id")
orders = orders.set_index("customer_id")
merged = orders.join(customers, how="left")
concat stacks DataFrames vertically or horizontally.
y2024 = pd.read_csv("data/orders_2024.csv")
y2025 = pd.read_csv("data/orders_2025.csv")
all_orders = pd.concat([y2024, y2025], ignore_index=True)
<aside> ⚠️ Many-to-many joins can multiply rows. If both tables have duplicate keys, the result size can explode.
</aside>
<aside>
⌨️ Hands on: Use these sample tables. Join orders with customers on customer_id. Use indicator=True and count how many rows are left_only.
import pandas as pd
orders = pd.DataFrame(
{
"order_id": [1, 2, 3, 4],
"customer_id": [100, 101, 999, 102],
"amount": [120, 90, 200, 75],
}
)
customers = pd.DataFrame(
{
"customer_id": [100, 101, 102],
"name": ["Alice", "Bob", "Chloe"],
}
)
🚀 Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=4&chapter=joining_merging&exercise=w4_joining_merging__left_join_customers&lang=python
💭 The widget uses plain Python lists of dictionaries to mimic a table.
</aside>
customers DataFrame to orders using a left join.suffixes=("_order", "_customer").concat to combine two monthly order files into one DataFrame.The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

Found a mistake or have a suggestion? Let us know in the feedback form.