Week 11 - Orchestration


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.

Introduction to Orchestration

Airflow Fundamentals

Scheduling and Triggers

Sequential Pipeline Steps

Parameterized Runs and Backfills

Testing DAGs

Monitoring and Debugging

Deploying to Shared Airflow

Practice

Gotchas & Pitfalls

Assignment: Build an Orchestrated Data Pipeline

Week 11 Lesson Plan (Teachers)

Week 11 - Orchestration

Welcome to Week 11! You have built pipelines that ingest, transform, and model data. Now it's time to make them run automatically and reliably. This week covers orchestration: scheduling your pipelines, chaining steps together, handling failures, and monitoring production runs.

By the end of this week, you will have a fully orchestrated pipeline that runs on a schedule, chains ingestion and transformation steps in the correct order, supports parameterized runs and backfills, and surfaces errors through Airflow's logs and the UI.

Why this matters for jobs

Junior data engineers in the Netherlands are usually trusted with existing pipelines before they build new ones. Teams expect you to:

Week 11 trains exactly these skills. This is strong portfolio evidence because it shows operational ownership: not just writing code that works once, but running it reliably every morning.

For the NL-specific picture (which postings ask for Airflow, salary bands, how to talk about Week 11 work in interviews), see Career relevance.

Learning goals

By Friday, you should be able to explain

The assignment has three tiers: everyone ships Minimum (local DAG + runbook), Target adds the 7-run backfill and a successful deploy to the shared class Airflow (ingest_taxi_month green; dbt_run / dbt_test orange-by-design unless inline-installed, per Ch8), Stretch is bonus. The Minimum tier stays completable even if the class VM is offline.

<aside> 📘 Core program connection: Week 11 extends the Core-program git lesson from "pushing to main affects everyone on the repo" to "pushing to main affects everyone's data pipeline." Deploying to Shared Airflow walks through the same PR-merge-deploy loop you already know, now with an Airflow scheduler picking up the result.

</aside>


Prerequisites

All tiers:

Target tier adds:

If one prerequisite is missing, still continue with local exercises first. Ask your teacher for a fallback plan during class.

Development environments

This week uses two environments:

Environment Tool Purpose
Local Astro CLI (astro dev start) Write and test DAGs on your machine
Class Shared Azure VM (Airflow + Docker Compose) Class demos + student-deployed DAGs; everyone runs on one scheduler (Target tier onward)

You write and iterate on DAGs locally. At Target tier you deploy the finished DAG to the class VM via a PR to the shared repo (Ch8). Minimum-tier students can skip the VM entirely.

Plain-language glossary

A handful of terms you will meet in almost every chapter:

The full list (scheduler, dag-processor, XCom, AIRFLOW_STUDENT, connections, hooks, tag namespacing, and more) lives on the Week 11 Glossary page.

<aside> 💡 If these words feel new, that is normal. Keep the glossary open in a side tab while you read chapters and add your own one-line notes to the terms you look up more than once.

</aside>

Before submitting the assignment, scan Gotchas & Pitfalls. It is a Week-11-specific index of the ~10 traps students hit most often, each linked back to the chapter that teaches the fix.

Chapters

  1. Introduction to Orchestration
  2. Airflow Fundamentals
  3. Scheduling and Triggers
  4. Sequential Pipeline Steps
  5. Parameterized Runs and Backfills
  6. Testing DAGs
  7. Monitoring and Debugging
  8. Deploying to Shared Airflow
  9. Practice
  10. Gotchas & Pitfalls
  11. Assignment: Build an Orchestrated Pipeline

Lesson plan

Glossary

Career relevance: Week 11 in the NL data job market

Going Further: Optional Deep Dives

History of Data Orchestration