Week 10 - SQL Transformations with dbt
dbt Setup for Azure PostgreSQL
Incremental Models, Docs, and Snapshots
Week 10 Assignment: Build a Daily Borough Stats Mart with dbt
Week 10 Lesson Plan (Teachers)
dbt feels friendly on day one and then silently misbehaves in week two. Use this list to debug faster: every gotcha below is something the chapters demonstrated, but that students tend to forget when they are moving quickly.
Each gotcha starts with what you might assume (the misconception), followed by what actually happens (the reality) and a short fix.
--select my_mart does not include upstream modelsIntroduced in Materializations & Layers and drilled in Practice Exercise 4.
"dbt run --select fct_trips rebuilds everything fct_trips depends on, just like the docs say 'dbt respects dependencies.'"
The DAG respects dependencies when dbt decides what to run, but --select names exactly one node. If stg_trips or stg_zones have not been built, fct_trips references a table that does not exist and fails with relation ... does not exist. Worse, if they have been built but are stale, fct_trips builds against stale data.
The Fix: use the graph operator + to include upstream: dbt run --select +fct_trips. Use fct_trips+ for downstream. For an end-to-end build with tests interleaved, dbt build is almost always what you want.
Introduced in dbt Setup for Azure PostgreSQL and SQL and Jinja Templating.
"from public.raw_trips works in psql, so it works in my dbt model."
It works today, and silently breaks tomorrow. Hardcoding public.raw_trips means dbt does not know your model depends on that source. Your model is missing from the lineage graph, dbt run --select +fct_trips will not build upstream, and renaming the source table will not surface a single dbt error. Same trap for hardcoding dev_<your_name>.stg_trips instead of {{ ref('stg_trips') }}: dbt cannot route the correct schema for dev vs prod targets, and your dev work silently writes into the wrong environment.
The Fix: every raw-table reference goes through {{ source('schema', 'table') }}. Every dbt-built table reference goes through {{ ref('model_name') }}. No exceptions, even for quick prototypes.
dbt run deploys untested modelsIntroduced in dbt build: models and tests together.
"I run dbt run for the build, dbt test later when I feel like it. If something is wrong my next dbt test will catch it."
dbt run rebuilds every model without checking any tests. A not_null violation in stg_trips does not stop fct_trips from being rebuilt on top of bad data. By the time you get around to dbt test, your dashboards have already been fed yesterday's broken numbers for hours.
The Fix: use dbt build in CI, scheduled jobs, and anywhere numbers are consumed. dbt build interleaves models and tests in DAG order: a failing not_null on stg_trips stops fct_trips from building at all. Keep dbt run / dbt test separately only for local iteration where you want to skip the test phase on a quick compile check.
tableIntroduced in view (default): no storage, always fresh.
"table is fast to query, view is slow because it recomputes. I will materialize everything as table to be safe."
Every table materialization runs a full DROP + CREATE TABLE AS SELECT on each dbt run. On a staging model that only exists to rename columns, that cost is pure overhead: downstream models read from the table, but nothing a dashboard queries. On Azure Postgres specifically, the DROP + CREATE churn generates write-ahead-log entries that accumulate until the retention window expires.
The Fix: staging models get view (read-only layer of dependent models). Intermediate models get view. Marts: the ones dashboards and analysts query: get table. The Choosing a materialization table is the short version.
profiles.yml silently in ~/.dbt/ instead of the projectIntroduced in Configure a project-local profiles.yml.
"dbt found a profile, my model ran, everything is fine."
dbt looks for profiles.yml first in the current directory, then in ~/.dbt/profiles.yml. If you have an old profile in ~/.dbt/ from a previous tutorial, dbt finds and uses it: pointing at whatever database that old profile named, which may not be this course's Azure Postgres. You build models into the wrong database without seeing a single error.
The Fix: keep a project-local profiles.yml (the course pattern). Always launch dbt from the project folder. dbt debug prints the path to the profile it actually loaded: read that line, especially after a git clone on a new machine.
accepted_values and other generic tests ignore NULLsIntroduced in accepted_values: the column only contains known values.
"I wrote accepted_values: [1, 2, 3, 4, 5, 6] on payment_type. If a row has payment_type = NULL, the test fails."
Generic tests like accepted_values, relationships, and (in practice) most user-written singular tests do not match NULL values. A NULL is "neither in nor out of the set," so the row silently passes. On the January 2024 dataset, ~6% of trips have payment_type IS NULL: every one of those would slip past accepted_values without not_null paired on the same column.
The Fix: pair accepted_values with an explicit not_null test on the same column (downgraded to severity: warn if NULLs are a known data quality issue, as the _stg_trips.yml from dbt Tests does). Same rule for relationships and most singular tests: assume NULL is never what you meant to allow unless you opted in.
stg_trips does not make it appear in fct_tripsDrilled in Practice Exercise 4.
"I added trip_duration_minutes to stg_trips and ran dbt run. fct_trips will pick it up because it refs stg_trips."
fct_trips.sql has an explicit select list. Columns missing from that list are not in the mart, full stop: dbt does not propagate schema changes automatically. Worse, the mart's row count and existing columns look correct, so a query that does not reference the new column returns without error.
The Fix: every schema change is a three-step routine. Add the column to stg_trips.sql. Add it to fct_trips.sql's select list (plus a description: in the YAML so docs stay accurate). Rebuild upstream: dbt run --select +fct_trips. If the mart is consumed by a dashboard, call out the new column in the PR so downstream owners know.
dbt docs generate without running models first shows empty columnsIntroduced in dbt docs generate and dbt docs serve.
"I added descriptions to my YAML files, so dbt docs generate should produce a docs site with all my models and columns."
dbt docs generate introspects the warehouse to discover column types and row counts: the step labelled Building catalog in the command's output. If the models have not been built yet (or if you are on a fresh dev_<your_name> schema where nothing exists), the catalog comes back empty, and the docs site shows model descriptions but no column list or types. dbt prints a small warning about "relations not found in the catalog" that is easy to miss.
The Fix: before dbt docs generate, run dbt run --select +fct_trips (or dbt build if you also want tests) against your target schema. Then regenerate the docs. If you just reset your dev schema, always do a full rebuild before a docs build.
--select, + / - graph operators, and tag-based selection.The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with ❤️ by the HackYourFuture community · Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.