Project Brief

This is your mid-track project. You have one week to design, build, and deploy a data pipeline that runs as a Container App Job on Azure and stores its results in Azure storage. You pick the data source and use case. The technical requirements and deadline are in the chapters below.

The week ends with a technical interview where you present your project and explain your decisions.

What you are building

A complete data pipeline that:

Ingests data from an external API or public dataset
Validates the data before storing it
Stores results in Azure Postgres, Azure Blob Storage, or both
Runs as a Container App Job on Azure (not on your laptop)

This is the same architecture you built in Week 6, but with a data source and use case you choose yourself.

┌─── Azure Container App Job ────────────────────────────────────┐
│  External API ──► pipeline.py ──► Pydantic ──► Azure Storage   │
└────────────────────────────────────────────────────────────────┘

<aside> 💡 In the wild: Open-source tools like dlt (data load tool) follow the same fetch-validate-store pattern you are building this week. Your project is a simplified version of what production data teams run every day.

</aside>

Choosing your data source

Pick something you find interesting. The best projects come from genuine curiosity. Your data source must be:

Free to access (no paid API keys)
Reliable (available when you need it, not flaky or rate-limited to 10 requests/day)
Returns structured data (JSON or CSV you can parse)

<aside> ⚠️ Verify your data source works on Day 1. Call the API, inspect the response, and confirm you can parse it. Do not discover on Day 4 that the API requires OAuth or returns HTML instead of JSON.

</aside>

Example project ideas

These are starting points, not requirements. You can combine ideas or invent your own.

Project	Data Source	What to Store
Weather tracker	Open-Meteo API (no key needed)	Hourly forecasts for Amsterdam, Rotterdam, Utrecht
Dutch weather history	KNMI Open Data (no key needed)	Hourly temperature and wind from Dutch weather stations
GitHub activity monitor	GitHub REST API (no key for public repos)	Commit counts and PR stats for repos you follow
Cryptocurrency prices	CoinGecko API (no key needed)	Price snapshots for top 10 coins in EUR
Eredivisie standings	Football-Data.org (free key required)	Dutch league tables and match results
Dutch public transport	OV API (no key needed, HTTP only)	Real-time departures and delays for Dutch stops
Dutch population stats	CBS Open Data (no key needed)	Demographics, migration, and household data from CBS
Space launches	Launch Library 2 (no key needed)	Upcoming rocket launches with status and location

<aside> 💡 If you are stuck choosing, start with Open-Meteo. It requires no API key, returns clean JSON, and the Week 6 examples already use weather data. You can focus on the pipeline and deployment instead of fighting with API authentication.

</aside>

Scope guidance

Keep it focused. A working pipeline with one data source and one storage target is a better project than an ambitious plan that is half-finished.

Good scope:

One API, one pipeline script, one table in Postgres
Fetches data, validates with Pydantic, inserts rows, logs the count
Runs as a Container App Job on Azure

Too ambitious for one week:

Three APIs merged into a data warehouse with transformations and a dashboard
Real-time streaming with event triggers
A web frontend that displays the data

You can always add stretch goals after the core pipeline works end to end.

Timeline

Day	Milestone
1	Pick data source, verify API works, scaffold project structure, deploy hello-world container to Azure
2-3	Pipeline works locally: ingests, validates, stores in Postgres/Blob
4	Replace hello-world with real pipeline, push to ACR, confirm job runs end to end
5	Polish, finalize README, prepare for technical interview

<aside> ⌨️ Hands on: Deploy the hello-world container on Day 1, before your pipeline code is written. This proves your Azure setup works early. If you hit firewall issues or image pull errors, you have four days to fix them instead of four hours.

</aside>

Getting started

A starter template is available in assets/starter-template/ in the course GitHub repository. Download or clone the repo, then copy the contents of assets/starter-template/ into the root of your new repository. Do not commit starter-template/ as a subfolder: root-level paths like .github/workflows/ must stay at the repository root. Then start replacing the stubs with your own logic. See Chapter 2 for the full project structure and requirements checklist.

<aside> 💡 Using AI to help: Use an LLM to help you explore API documentation, generate Pydantic models from example JSON responses, or draft your README. Document what you used in AI_ASSIST.md. (⚠️ Ensure no PII or sensitive company data is included!)

</aside>

Extra reading

Public APIs list: curated directory of free APIs for projects

The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.