Week 7 - Mid-Track Project

Project Brief

Project Guidelines and Requirements

Week 7 Gotchas & Pitfalls

Project Brief

This is your mid-track project. You have one week to design, build, and deploy a data pipeline that runs as a Container App Job on Azure and stores its results in Azure storage. You pick the data source and use case. The technical requirements and deadline are in the chapters below.

The week ends with a technical interview where you present your project and explain your decisions.

What you are building

A complete data pipeline that:

  1. Ingests data from an external API or public dataset
  2. Validates the data before storing it
  3. Stores results in Azure Postgres, Azure Blob Storage, or both
  4. Runs as a Container App Job on Azure (not on your laptop)

This is the same architecture you built in Week 6, but with a data source and use case you choose yourself.

┌─── Azure Container App Job ────────────────────────────────────┐
│  External API ──► pipeline.py ──► Pydantic ──► Azure Storage   │
└────────────────────────────────────────────────────────────────┘

<aside> 💡 In the wild: Open-source tools like dlt (data load tool) follow the same fetch-validate-store pattern you are building this week. Your project is a simplified version of what production data teams run every day.

</aside>

Choosing your data source

Pick something you find interesting. The best projects come from genuine curiosity. Your data source must be:

<aside> ⚠️ Verify your data source works on Day 1. Call the API, inspect the response, and confirm you can parse it. Do not discover on Day 4 that the API requires OAuth or returns HTML instead of JSON.

</aside>

Example project ideas

These are starting points, not requirements. You can combine ideas or invent your own.

Project Data Source What to Store
Weather tracker Open-Meteo API (no key needed) Hourly forecasts for Amsterdam, Rotterdam, Utrecht
Dutch weather history KNMI Open Data (no key needed) Hourly temperature and wind from Dutch weather stations
GitHub activity monitor GitHub REST API (no key for public repos) Commit counts and PR stats for repos you follow
Cryptocurrency prices CoinGecko API (no key needed) Price snapshots for top 10 coins in EUR
Eredivisie standings Football-Data.org (free key required) Dutch league tables and match results
Dutch public transport OV API (no key needed, HTTP only) Real-time departures and delays for Dutch stops
Dutch population stats CBS Open Data (no key needed) Demographics, migration, and household data from CBS
Space launches Launch Library 2 (no key needed) Upcoming rocket launches with status and location

<aside> 💡 If you are stuck choosing, start with Open-Meteo. It requires no API key, returns clean JSON, and the Week 6 examples already use weather data. You can focus on the pipeline and deployment instead of fighting with API authentication.

</aside>

Scope guidance

Keep it focused. A working pipeline with one data source and one storage target is a better project than an ambitious plan that is half-finished.

Good scope:

Too ambitious for one week:

You can always add stretch goals after the core pipeline works end to end.

Timeline

Day Milestone
1 Pick data source, verify API works, scaffold project structure, deploy hello-world container to Azure
2-3 Pipeline works locally: ingests, validates, stores in Postgres/Blob
4 Replace hello-world with real pipeline, push to ACR, confirm job runs end to end
5 Polish, finalize README, prepare for technical interview

<aside> ⌨️ Hands on: Deploy the hello-world container on Day 1, before your pipeline code is written. This proves your Azure setup works early. If you hit firewall issues or image pull errors, you have four days to fix them instead of four hours.

</aside>

Getting started

A starter template is available in assets/starter-template/ in the course GitHub repository. Download or clone the repo, then copy the contents of assets/starter-template/ into the root of your new repository. Do not commit starter-template/ as a subfolder: root-level paths like .github/workflows/ must stay at the repository root. Then start replacing the stubs with your own logic. See Chapter 2 for the full project structure and requirements checklist.

<aside> 💡 Using AI to help: Use an LLM to help you explore API documentation, generate Pydantic models from example JSON responses, or draft your README. Document what you used in AI_ASSIST.md. (⚠️ Ensure no PII or sensitive company data is included!)

</aside>

Extra reading


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.