Week 11 - Orchestration

Introduction to Orchestration

Airflow Fundamentals

Scheduling and Triggers

Sequential Pipeline Steps

Parameterized Runs and Backfills

Testing DAGs

Monitoring and Debugging

Deploying to Shared Airflow

Practice

Gotchas & Pitfalls

Assignment: Build an Orchestrated Data Pipeline

Week 11 Lesson Plan (Teachers)

Practice

These exercises help you turn Week 11 concepts into working Airflow behavior. Complete them in order. Each exercise builds toward the assignment.

Every exercise reuses the taxi_pipeline you built across Sequential Pipelines, Parameterized Runs and Backfills, Testing DAGs, and Monitoring and Debugging. You are not building a new DAG here; you are exercising the one you already have. If yours is broken, diff against assets/dag_snapshots/taxi_pipeline.py and start from the reference.

<aside> 📝 Practice is optional, but strongly recommended before starting the assignment.

</aside>

Before you run any exercise, set AIRFLOW_STUDENT=yourname in your Astro project's .env and restart the stack (astro dev restart). Without it, your DAG writes to airflow_default on the shared Azure Postgres and collides with every other student who also forgot. Sequential Pipelines covers the why; Practice just needs you to verify the env var is live before Exercise 1.

Suggested pacing

<aside> 💭 If you get blocked for more than 20 minutes, write what you tried, ask for help, and continue with the next exercise. This is professional behavior, not failure.

</aside>

Exercise 1: Run the taxi pipeline locally

Concepts: Astro CLI, DAG discovery, manual trigger.

Instructions:

  1. From the Astro project you used during Sequential Pipelines, run astro dev start.
  2. Open the Airflow UI URL Astro printed in your terminal (the http://<project-dirname>.localhost:<random-port> line, not http://localhost:8080: see Airflow Fundamentals) and confirm taxi_pipeline appears in the DAG list.
  3. Unpause the DAG, then trigger a run for 2024-01-01. Airflow 3's plain Trigger button leaves logical_date unset (see Airflow macros and runtime context), so use one of these instead: in the UI click the dropdown next to TriggerTrigger DAG w/ config and paste {"logical_date": "2024-01-01T00:00:00+00:00"}; or from the terminal run astro dev run dags trigger taxi_pipeline -e 2024-01-01.
  4. Watch the Grid view until ingest_taxi_month, dbt_run, and dbt_test are all green. Inspect the dbt_test log and verify the Done. PASS=9 WARN=2 summary from Week 10.

Success criteria: One green run for 2024-01-01 in your airflow_<yourname> schema, with raw_trips populated and fct_trips rebuilt.

Exercise 2: Add scheduling and retries

Concepts: cron schedule, retry configuration.

Instructions:

  1. The reference taxi_pipeline uses schedule="@monthly" and default_args={"retries": 2}. Override these per-instance on your own DAG so the scheduler fires at 06:00 UTC on the first of each month, not midnight: change schedule to the cron expression "0 6 1 * *".
  2. Add retry_delay=timedelta(minutes=2) next to retries in default_args.
  3. Reserialize (astro dev run dags reserialize) and open the DAG's Details tab in the UI to confirm both settings are applied.

Success criteria: DAG metadata shows schedule 0 6 1 * *, retries=2, retry_delay=0:02:00.

Exercise 3: Add a quality gate between ingest and dbt

Concepts: task dependencies, TaskFlow return values, >> chaining.

Instructions:

  1. Add a check_row_count task between ingest_taxi_month and dbt_run. It takes the row count returned by ingest_taxi_month (an int, not a file path: XCom handles ints cleanly) and raises ValueError if the count is below 1000.
  2. Wire it into the chain: ingest_taxi_month() >> check_row_count() >> dbt_run >> dbt_test. The TaskFlow form is check_row_count(ingest_taxi_month()) followed by >> dbt_run.
  3. Trigger the DAG normally. All four tasks should go green.
  4. Confirm the Graph view shows the new four-node chain.

Success criteria: Four-node chain visible in Graph view; happy-path run is green; all dependencies fire in order.

Exercise 4: Trigger multiple months and compare ds

Concepts: {{ ds }} templating, parameterized runs.

Instructions:

  1. Manually trigger taxi_pipeline for 2024-01-01, then 2024-02-01, then 2024-03-01 (use the "Trigger DAG w/ config" option if your Airflow version requires an explicit logical date for manual triggers).
  2. Open each run's ingest_taxi_month task and find the ds value in the task log (it will show up in the URL built by parquet_url_for(ds)).
  3. Query airflow_<yourname>.raw_trips grouped by to_char(lpep_pickup_datetime, 'YYYY-MM'). You should see three rows: 2024-01, 2024-02, 2024-03, each with the correct row count for that month's TLC parquet.

Success criteria: Three distinct ds values across three runs; three distinct year-month buckets in raw_trips.

Exercise 5: Controlled backfill with idempotency proof

Concepts: airflow backfill create, idempotent DELETE-then-append.

Instructions:

  1. Run astro dev run backfill create --dag-id taxi_pipeline --from-date 2023-11-01 --to-date 2024-01-31 --max-active-runs 1. Wait for all three runs to go green.
  2. Query raw_trips grouped by year-month: you should see Nov 2023, Dec 2023, Jan 2024.
  3. Re-run the exact same backfill command. The three runs should go green again, and the row counts per year-month should not change.

Success criteria: Three year-month buckets in raw_trips after the first backfill, identical counts after the second. DELETE-then-append keeps the pipeline idempotent.

Exercise 6: Reproduce the Ch7 403 failure

Concepts: log investigation, raise_for_status, 403 vs transient failures.

Instructions:

  1. Edit parquet_url_for in your DAG: append _typo to the filename (as in Monitoring and Debugging): f"{TLC_BASE}/green_tripdata_{year_month}_typo.parquet".
  2. Trigger the DAG. ingest_taxi_month goes red.
  3. Open the task log. Scroll to the bottom, then read upward: the first useful line is HTTPError: 403 Client Error: Forbidden for url: ....
  4. Add one line to your RUNBOOK.md: symptom, check, fix.
  5. Revert the typo. Trigger again. Confirm the run goes green.

Success criteria: You read the 403 from the log without scrolling all over the UI, you wrote one runbook entry, and the revert-and-rerun round-trip completed cleanly.

<aside> 🤓 Curious Geek: Blameless post-mortems

The runbook entry you just wrote is the seed of a post-mortem: a written account of what broke, when, why, and what changes prevent a repeat. Mature data teams write one for every customer-impacting incident, and they write them blamelessly: the document names systems and decisions, never people. The reasoning, popularised by Google's SRE practice, is that engineers who fear being named hide context, and hidden context is what causes the next incident. A weekly review of post-mortems is how a team's collective knowledge grows faster than any single member's experience.

</aside>

Exercise 7: Deploy to the shared Airflow

Concepts: shared-infra workflow, deploy-on-push, multi-student etiquette (from Deploying to Shared Airflow).

Instructions:

  1. Take the DAG you iterated on in Exercises 1-6 and copy it into your subdirectory (dags/<yourname>/) of the class's shared Airflow repo.
  2. Namespace the dag_id to <yourname>_<dag_name> and add "student:<yourname>" to the tags list.
  3. Run astro dev pytest tests/test_dag_integrity.py --args "-v" locally before pushing.
  4. Open a pull request, merge once CI (or your local integrity test) is green, and wait up to 60 seconds for the shared dag-processor to re-parse.
  5. Filter the shared UI by your student:<yourname> tag, unpause your DAG, and trigger one manual run. Verify ingest_taxi_month goes green; dbt_run and dbt_test are expected orange (Up For Retry) on the shared VM unless you inline-install dbt or the teacher has rebuilt the image: see Deploying to Shared Airflow for the rationale.
  6. Find a classmate's DAG on the same UI (different student: tag). Note that you can read it but must not pause or trigger it. Write one sentence on what you observed about their DAG's shape.

Success criteria: Your DAG is parsed by the shared scheduler, ingest_taxi_month runs green and writes to your own airflow_<yourname> schema, the DAG is discoverable via tag filter, and you have not touched any classmate's DAG. (dbt_run / dbt_test orange-by-design on the stock shared VM is acceptable per Ch8.)

Career reflection (10 minutes)

After finishing practice, write 5 to 8 lines:

  1. Which failure did you debug?
  2. What evidence proved your fix worked?
  3. Which skill from this week would you mention in a junior data engineering interview?

Add this reflection to your assignment report draft.

<aside> 💡 Using AI to help: Before Exercise 3, paste your taxi_pipeline (redacted: ⚠️ no credentials, no PII, no real database host) into an LLM and ask it to suggest where a quality-gate task could sit without breaking the idempotency contract. Review the suggestion before coding: LLMs occasionally recommend placing the gate after dbt_run, which defeats the purpose of failing fast on bad data.

</aside>

Extra reading


In the next chapter, you will scan the Week 11 gotchas before starting your assignment. Read them fresh: you have just practiced enough that most of them will jump out as "oh, I nearly did that" rather than "I have no idea what this warning is about."