Week 4 - Data Processing with Pandas
Introduction to Pandas and DataFrames
Selecting, Filtering, and Sorting Data
Joining and Merging DataFrames
Working with Strings and Dates
Assignment: MessyCorp Goes Pandas
Visualization helps you spot outliers, trends, and data quality issues quickly. Even in data engineering, a simple chart can reveal broken joins, missing values, or unexpected spikes before your pipeline ships bad data.
DataFrame.plotPandas integrates with Matplotlib, so you can create quick charts with one line.
import pandas as pd
sales = pd.DataFrame(
{
"date": ["2024-01-01", "2024-01-02", "2024-01-03"],
"amount": [120, 90, 150],
}
)
sales.plot(x="date", y="amount", kind="line", title="Daily revenue")
Common plot types:
kind="line" for trendskind="bar" for category comparisonskind="hist" for distributionskind="box" for outliers<aside> ⚠️ Always sort by date before plotting time series. Unsorted dates create misleading lines.
</aside>
<aside> 🤓 Curious Geek: Anscombe's quartet
Four datasets can have identical statistics but very different plots. Visualization protects you from false confidence.
</aside>
Aggregations often become charts.
daily = sales.groupby("date", as_index=False)["amount"].sum()
daily.plot(x="date", y="amount", kind="line")
Use Matplotlib directly when you need labels, size control, or export.
import matplotlib.pyplot as plt
ax = daily.plot(x="date", y="amount", kind="line", figsize=(6, 3))
ax.set_xlabel("Date")
ax.set_ylabel("Revenue")
plt.tight_layout()
plt.savefig("output/daily_revenue.png")
<aside> 💡 Saving charts to files is useful for automated reports and dashboards.
</aside>
<aside>
⌨️ Hands on: Use this sample table. Group by date, calculate total revenue, and plot a line chart.
import pandas as pd
sales = pd.DataFrame(
{
"date": ["2024-01-01", "2024-01-01", "2024-01-02", "2024-01-03"],
"amount": [120, 80, 90, 150],
}
)
🚀 Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=4&chapter=visualizing_data&exercise=w4_visualizing_data__daily_totals&lang=python
💭 The widget uses plain Python lists of dictionaries to produce line-chart points.
</aside>
output/ and verify the file exists.The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

Found a mistake or have a suggestion? Let us know in the feedback form.