Week 4 - Data Processing with Pandas
Introduction to Pandas and DataFrames
Selecting, Filtering, and Sorting Data
Joining and Merging DataFrames
Working with Strings and Dates
Assignment: MessyCorp Goes Pandas
Once your data is clean, you often need to reshape it for dashboards, machine learning features, or warehouse tables. These transformations are where Pandas shines, but also where performance traps appear.
Use pivot to go from long to wide. Use melt to go from wide to long.
# Long to wide
pivoted = orders.pivot(index="customer_id", columns="month", values="amount")
# Wide to long
long = pivoted.reset_index().melt(
id_vars="customer_id",
var_name="month",
value_name="amount",
)
<aside> 🤓 Curious Geek: Pivot tables
The term comes from spreadsheet tools, where pivot tables helped analysts summarize large tables quickly.
</aside>
pivot_table for Aggregationssummary = orders.pivot_table(
index="region",
columns="channel",
values="amount",
aggfunc="sum",
fill_value=0,
)
Use map or replace for simple lookups. Use apply only when needed.
region_map = {"NL": "Europe", "US": "North America"}
orders["region_group"] = orders["country"].map(region_map).fillna("Other")
Rolling and cumulative metrics are common in analytics.
orders = orders.sort_values("order_date")
orders["rolling_7d"] = orders["amount"].rolling(7, min_periods=1).sum()
orders["cumulative"] = orders["amount"].cumsum()
orders["prev_amount"] = orders["amount"].shift(1)
Vectorized operations are faster and clearer.
# Slow loop
orders["amount_usd"] = [amount * 1.1 for amount in orders["amount"]]
# Vectorized
orders["amount_usd"] = orders["amount"] * 1.1
<aside>
⚠️ apply can be much slower than vectorized operations. Reach for apply only when no vectorized option exists.
</aside>
<aside>
⌨️ Hands on: Use this sample table. Pivot by region and month, then melt it back into a long format.
import pandas as pd
sales = pd.DataFrame(
{
"region": ["NL", "NL", "BE", "BE"],
"month": ["2024-01", "2024-02", "2024-01", "2024-02"],
"amount": [120, 150, 90, 110],
}
)
🚀 Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=4&chapter=advanced_transformations&exercise=w4_advanced_transformations__pivot_sales&lang=python
💭 The widget uses plain Python lists of dictionaries to mimic a table.
</aside>
amount per region and channel.rolling_30d column that sums the last 30 days of orders.map.The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

Found a mistake or have a suggestion? Let us know in the feedback form.