Week 4 - Data Processing with Pandas
Introduction to Pandas and DataFrames
Selecting, Filtering, and Sorting Data
Joining and Merging DataFrames
Working with Strings and Dates
Assignment: MessyCorp Goes Pandas
Real-world data is messy. Names have extra spaces, emails have mixed case, and dates come in ten different formats. Pandas gives you vectorized tools to clean this without loops.
Use the .str accessor to apply string logic to a full column.
users["email"] = users["email"].str.strip().str.lower()
users["country"] = users["country"].str.upper()
users["has_gmail"] = users["email"].str.contains("@gmail.com", na=False)
users["first_name"] = users["name"].str.split(" ").str[0]
users["phone"] = users["phone"].str.replace(" ", "", regex=False)
users["area_code"] = users["phone"].str.extract(r"(\\d{2,3})")
<aside> 💡 If a pattern is complex, ask an LLM to draft the regex, then test it on real data before using it in production.
</aside>
<aside> 🤓 Curious Geek: ISO 8601
The YYYY-MM-DD format is part of the ISO 8601 date standard. It sorts correctly as a string.
</aside>
pd.to_datetime converts strings into datetime objects.
orders["order_date"] = pd.to_datetime(
orders["order_date"],
errors="coerce",
format="%Y-%m-%d",
)
errors="coerce" turns invalid values into NaT so you can handle them later.
orders["year"] = orders["order_date"].dt.year
orders["month"] = orders["order_date"].dt.month
orders["weekday"] = orders["order_date"].dt.day_name()
orders["days_since_order"] = (pd.Timestamp("today") - orders["order_date"]).dt.days
orders["order_date"] = orders["order_date"].dt.tz_localize("UTC")
orders["order_date_eu"] = orders["order_date"].dt.tz_convert("Europe/Amsterdam")
monthly = orders.set_index("order_date").resample("M")["amount"].sum()
<aside>
⌨️ Hands on: Use this sample table. Clean the city column by stripping spaces and converting to title case. Then parse signup_date and extract the year.
import pandas as pd
users = pd.DataFrame(
{
"name": ["Alice", "Bob", "Chloe"],
"city": [" amsterdam", "ROTTERDAM ", " the hague "],
"signup_date": ["2024-01-05", "2024/01/10", "invalid"],
}
)
🚀 Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=4&chapter=strings_dates&exercise=w4_strings_dates__clean_city_year&lang=python
💭 The widget uses plain Python lists of dictionaries to mimic a table.
</aside>
email by lowercasing and trimming spaces.invoice_id from a column like INV-2024-0012.created_at column with mixed formats and count invalid rows.The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

Found a mistake or have a suggestion? Let us know in the feedback form.