Joining and Merging DataFrames
Working with Strings and Dates
Going Further: Optional Deep Dives
Visualization helps you spot outliers, trends, and data quality issues quickly. Even in data engineering, a simple chart can reveal broken joins, missing values, or unexpected spikes before your pipeline ships bad data.
By the end of this chapter, you should be able to create quick diagnostic charts from a DataFrame, group data before plotting, and save charts to files using Matplotlib.
<aside>
๐ฆ Run the examples: companion_ch8_visualizing_data.py: run in the Codespace or clone locally to follow along with this chapter.
</aside>
import matplotlib
matplotlib.use("Agg") # headless backend: no display required
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path
Path("output").mkdir(exist_ok=True)
DataFrame.plotPandas integrates with Matplotlib, so you can create quick charts with one line.
import pandas as pd
sales = pd.DataFrame(
{
"date": ["2024-01-01", "2024-01-02", "2024-01-03"],
"amount": [120, 90, 150],
}
)
sales.plot(x="date", y="amount", kind="line", title="Daily revenue")
Common plot types:
kind="line" for trendskind="bar" for category comparisonskind="hist" for distributionskind="box" for outliers<aside> โ ๏ธ Always sort by date before plotting time series. Unsorted dates create misleading lines.
</aside>
Even a simple line chart can reveal pipeline issues that summary statistics hide.
<aside> ๐ค Curious Geek: Anscombe's quartet
Four datasets can have identical statistics but very different plots. Visualization protects you from false confidence.
</aside>
Aggregations often become charts.
daily = sales.groupby("date", as_index=False)["amount"].sum()
daily.plot(x="date", y="amount", kind="line")
Use Matplotlib directly when you need labels, size control, or export.
ax = daily.plot(x="date", y="amount", kind="line", figsize=(6, 3))
ax.set_xlabel("Date")
ax.set_ylabel("Revenue")
plt.tight_layout()
plt.savefig("output/daily_revenue.png")
plt.close()
<aside> ๐ก Saving charts to files is useful for automated reports and dashboards.
</aside>
Use this sample table. Group by region, sum amount, and plot a bar chart. Return the Axes object so it can be inspected.
import matplotlib
matplotlib.use("Agg") # headless: must come before importing pyplot
import matplotlib.pyplot as plt
import pandas as pd
orders = pd.DataFrame(
{
"region": ["NL", "NL", "BE", "DE"],
"amount": [120, 80, 200, 50],
}
)
<aside> ๐ Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=4&chapter=visualizing_data&exercise=w4_visualizing_data__daily_totals&lang=python
</aside>
output/ and verify the file exists.<aside> ๐ When your data grows past what Pandas handles comfortably, see the optional Alternatives to Pandas chapter or the Going Further page for deep dives on Polars and Dask.
</aside>
kind="hist" instead of kind="line"?plt.savefig("output/chart.png") instead of plt.show()?Test your recall before moving on.
<aside> ๐ Try it in the widget: Interactive Quiz: Visualizing Data with Pandas
</aside>
https://lasse.be/simple-hyf-teach-widget/mcq.html?bank=week_4_ch8_visualizing_data_quiz&embed=1
If Matplotlib's API felt unfamiliar, this video walks through creating and customizing your first plots from scratch.
<aside> ๐ฌ Struggling with this concept? Watch this beginner-friendly video:
</aside>
https://www.youtube.com/watch?v=UO98lJQ3QGI
You can also describe your chart goal to an LLM to get a starting point.
<aside> ๐ก Using AI to help: Paste a description of your chart goal (โ ๏ธ Ensure no PII or sensitive company data is included!): for example "bar chart of revenue per region, sorted descending, saved to output/", and ask an LLM to write the Matplotlib code. Run it and tweak the labels before using it in a report.
</aside>
Ready to apply these skills? Try the practice exercise before moving on.
<aside> โจ๏ธ Hands on: Practice with Exercise 7: Visualize Revenue.
</aside>
Next up: Alternatives to Pandas, where you learn when Pandas is still the right tool and when a larger workload may need something else.
The HackYourFuture curriculum is licensed underย CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with โค๏ธ by the HackYourFuture community ยท Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.