Week 10 - SQL Transformations with dbt
dbt Setup for Azure PostgreSQL
Incremental Models, Docs, and Snapshots
Week 10 Assignment: Build a Daily Borough Stats Mart with dbt
Week 10 Lesson Plan (Teachers)
Content
<aside>
π These exercises are optional but strongly recommended. Each one drills a single skill that the chapter hands-on sections did not cover end-to-end. Work through them in order: Exercise 1 is a hard prerequisite for the fct_trips mart, and the later exercises assume the project state from Ch2βCh6.
</aside>
By the end of this chapter, you should have:
stg_trips model with tip_pct, fare_per_mile, and payment_type_label columns, plus a safe_divide macro.tests/ folder that catches a business-rule violation.stg_trips (SQL and Jinja Templating)Materializations & Layers builds the fct_trips mart assuming stg_trips exposes tip_pct, fare_per_mile, and payment_type_label. This exercise adds those columns. Do it before you run dbt build --select +fct_trips in the assignment.
macros/safe_divide.sql with the macro from SQL and Jinja Templating.models/staging/stg_trips.sql to select two new derived columns:{{ safe_divide('tip_amount', 'fare_amount') }} as tip_pct{{ safe_divide('fare_amount', 'trip_distance') }} as fare_per_milepayment_type_label column using the {% for %} mapping from the Jinja chapter (codes 1-6 β human labels).dbt compile --select stg_trips and open target/compiled/nyc_taxi/models/staging/stg_trips.sql. Confirm every {{ ... }} has been resolved to plain SQL.dbt run --select stg_trips and spot-check: SELECT payment_type, payment_type_label, tip_pct, fare_per_mile
FROM dev_<your_name>.stg_trips
LIMIT 10;
Success criteria: every payment_type code resolves to a readable label, and tip_pct / fare_per_mile are non-null for rows with non-zero denominators.
Singular tests are plain .sql files that return the bad rows: introduced in dbt Tests but not drilled. Write one yourself.
tests/assert_fare_amount_non_negative.sql containing a select against {{ ref('stg_trips') }} that returns every row whose fare_amount is negative. Recall: an empty result passes, any returned rows fail.dbt test --select test_type:singular. The test will fail with FAIL 182 against the January 2024 dataset: there are 182 rows where the TLC has recorded a negative fare (most often credit-card refunds or voided trips re-entered as corrections).target/compiled/nyc_taxi/tests/assert_fare_amount_non_negative.sql and run it directly in psql to inspect the offending rows. The compiled form is the "show me the bad rows" query you would write by hand: no debugging setup needed.error-severity test blocks the pipeline. Real teams either (a) downgrade to warn severity by moving the test into YAML with a custom test or keeping it as a singular test and editing the singular-test config in dbt_project.yml, or (b) filter negative fares out in stg_trips so fct_trips never sees them. For this exercise, edit stg_trips.sql to add and fare_amount >= 0 to the where clause, re-run dbt run --select stg_trips, and re-run dbt test --select test_type:singular. The test should now pass.Success criteria: you can show that the test originally failed with 182 rows, and after filtering negative fares out of stg_trips the test passes cleanly. You can state in one sentence why a singular test is the right tool for this kind of check rather than a generic YAML test.
ref() (SQL and Jinja Templating)When dbt run fails because of a bad ref(), the error lands at parse or compile time: not at run time: and the fix lives in a different file than the error points at. Drill the compile-to-find-the-typo workflow now, before you meet it under pressure in the assignment.
models/marts/fct_trips.sql, introduce a deliberate typo: change one {{ ref('stg_trips') }} to {{ ref('stg_trps') }}.dbt compile --select fct_trips. Read the error message carefully: note which node dbt names and which file it points at.dbt compile --select fct_trips. Confirm it now compiles cleanly.Success criteria: you can state in one sentence what dbt's error told you, and can explain why the error appears at compile time rather than at run time. (Hint: compare with what would happen if you had typed the full table name in a plain SQL view instead.)
Adding a column to a staging model and making it appear in the mart is the single most common dbt change in a real job. The chapter hands-on did not exercise it end to end. Do one now.
trip_duration_minutes to models/staging/stg_trips.sql: extract(epoch from (dropoff_datetime - pickup_datetime)) / 60 as trip_duration_minutes
dbt run --select stg_trips. Confirm the view rebuilds. SELECT trip_duration_minutes FROM dev_<your_name>.stg_trips LIMIT 3;
SELECT trip_duration_minutes FROM dev_<your_name>.fct_trips LIMIT 3;
This should error. Explain why.
t.trip_duration_minutes to the select list in models/marts/fct_trips.sql.dbt run --select +fct_trips.Success criteria: you can state in one sentence why stg_trips had the column but fct_trips did not, and what the + prefix in --select +fct_trips does that plain --select fct_trips would not.
These do not require running dbt: discuss with a classmate or write down your answers.
fct_trips were a plain SQL CREATE VIEW in Postgres instead of a dbt model, at which stage (write / save / query) would the same typo surface? Why does dbt's model compile its SQL separately from executing it?stg_trips but querying it from fct_trips still failed until you also edited the mart. What does that tell you about how dbt propagates changes (and about why --select +fct_trips is the right pattern on a real team)?accepted_values test on payment_type allowing [1, 2, 3, 4, 5]. A trip with payment_type = 6 (voided) appears in the source. What happens when you run dbt test, and what severity would you set on that test?The HackYourFuture curriculum is licensed underΒ CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with β€οΈ by the HackYourFuture community Β· Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.