Week 10 - SQL Transformations with dbt

Introduction to dbt Core

dbt Setup for Azure PostgreSQL

SQL and Jinja Templating

dbt Tests

Incremental Models, Docs, and Snapshots

Building Your First Mart

Week 10 Gotchas & Pitfalls

Practice

Week 10 Assignment: Build a Daily Borough Stats Mart with dbt

Week 10 - Video Guide

Week 10 Lesson Plan (Teachers)

Content

Practice

<aside> πŸ“ These exercises are optional but strongly recommended. Each one drills a single skill that the chapter hands-on sections did not cover end-to-end. Work through them in order: Exercise 1 is a hard prerequisite for the fct_trips mart, and the later exercises assume the project state from Ch2–Ch6.

</aside>

By the end of this chapter, you should have:

Exercise 1: Macros and computed columns for stg_trips (SQL and Jinja Templating)

Materializations & Layers builds the fct_trips mart assuming stg_trips exposes tip_pct, fare_per_mile, and payment_type_label. This exercise adds those columns. Do it before you run dbt build --select +fct_trips in the assignment.

  1. Create macros/safe_divide.sql with the macro from SQL and Jinja Templating.
  2. Update models/staging/stg_trips.sql to select two new derived columns:
  1. Add a payment_type_label column using the {% for %} mapping from the Jinja chapter (codes 1-6 β†’ human labels).
  2. Run dbt compile --select stg_trips and open target/compiled/nyc_taxi/models/staging/stg_trips.sql. Confirm every {{ ... }} has been resolved to plain SQL.
  3. Run dbt run --select stg_trips and spot-check:
   SELECT payment_type, payment_type_label, tip_pct, fare_per_mile
   FROM dev_<your_name>.stg_trips
   LIMIT 10;

Success criteria: every payment_type code resolves to a readable label, and tip_pct / fare_per_mile are non-null for rows with non-zero denominators.

Exercise 2: Write a singular test (dbt Tests)

Singular tests are plain .sql files that return the bad rows: introduced in dbt Tests but not drilled. Write one yourself.

  1. Create tests/assert_fare_amount_non_negative.sql containing a select against {{ ref('stg_trips') }} that returns every row whose fare_amount is negative. Recall: an empty result passes, any returned rows fail.
  2. Run dbt test --select test_type:singular. The test will fail with FAIL 182 against the January 2024 dataset: there are 182 rows where the TLC has recorded a negative fare (most often credit-card refunds or voided trips re-entered as corrections).
  3. Open the compiled test at target/compiled/nyc_taxi/tests/assert_fare_amount_non_negative.sql and run it directly in psql to inspect the offending rows. The compiled form is the "show me the bad rows" query you would write by hand: no debugging setup needed.
  4. Decide what to do: the test is working as designed, but a failing error-severity test blocks the pipeline. Real teams either (a) downgrade to warn severity by moving the test into YAML with a custom test or keeping it as a singular test and editing the singular-test config in dbt_project.yml, or (b) filter negative fares out in stg_trips so fct_trips never sees them. For this exercise, edit stg_trips.sql to add and fare_amount >= 0 to the where clause, re-run dbt run --select stg_trips, and re-run dbt test --select test_type:singular. The test should now pass.

Success criteria: you can show that the test originally failed with 182 rows, and after filtering negative fares out of stg_trips the test passes cleanly. You can state in one sentence why a singular test is the right tool for this kind of check rather than a generic YAML test.

Exercise 3: Debug a broken ref() (SQL and Jinja Templating)

When dbt run fails because of a bad ref(), the error lands at parse or compile time: not at run time: and the fix lives in a different file than the error points at. Drill the compile-to-find-the-typo workflow now, before you meet it under pressure in the assignment.

  1. In models/marts/fct_trips.sql, introduce a deliberate typo: change one {{ ref('stg_trips') }} to {{ ref('stg_trps') }}.
  2. Run dbt compile --select fct_trips. Read the error message carefully: note which node dbt names and which file it points at.
  3. Fix the typo.
  4. Re-run dbt compile --select fct_trips. Confirm it now compiles cleanly.

Success criteria: you can state in one sentence what dbt's error told you, and can explain why the error appears at compile time rather than at run time. (Hint: compare with what would happen if you had typed the full table name in a plain SQL view instead.)

Exercise 4: Propagate a column change (Materializations & Layers)

Adding a column to a staging model and making it appear in the mart is the single most common dbt change in a real job. The chapter hands-on did not exercise it end to end. Do one now.

  1. Add trip_duration_minutes to models/staging/stg_trips.sql:
   extract(epoch from (dropoff_datetime - pickup_datetime)) / 60 as trip_duration_minutes
  1. Run dbt run --select stg_trips. Confirm the view rebuilds.
  2. Check the column exists in the view:
   SELECT trip_duration_minutes FROM dev_<your_name>.stg_trips LIMIT 3;
  1. Now try to query the same column from the mart:
   SELECT trip_duration_minutes FROM dev_<your_name>.fct_trips LIMIT 3;

This should error. Explain why.

  1. Add t.trip_duration_minutes to the select list in models/marts/fct_trips.sql.
  2. Rebuild the mart and its upstream: dbt run --select +fct_trips.
  3. Re-run the mart query from step 4. It now returns rows.

Success criteria: you can state in one sentence why stg_trips had the column but fct_trips did not, and what the + prefix in --select +fct_trips does that plain --select fct_trips would not.

Conceptual questions

These do not require running dbt: discuss with a classmate or write down your answers.

  1. Exercise 3 had you introduce a typo and catch it at compile time. If fct_trips were a plain SQL CREATE VIEW in Postgres instead of a dbt model, at which stage (write / save / query) would the same typo surface? Why does dbt's model compile its SQL separately from executing it?
  2. In Exercise 4, you added a column to stg_trips but querying it from fct_trips still failed until you also edited the mart. What does that tell you about how dbt propagates changes (and about why --select +fct_trips is the right pattern on a real team)?
  3. You add an accepted_values test on payment_type allowing [1, 2, 3, 4, 5]. A trip with payment_type = 6 (voided) appears in the source. What happens when you run dbt test, and what severity would you set on that test?

The HackYourFuture curriculum is licensed underΒ CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❀️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.