Week 10 - SQL Transformations with dbt

Introduction to dbt Core

dbt Setup for Azure PostgreSQL

SQL and Jinja Templating

dbt Tests

Incremental Models, Docs, and Snapshots

Building Your First Mart

Week 10 Gotchas & Pitfalls

Practice

Week 10 Assignment: Build a Daily Borough Stats Mart with dbt

Week 10 - Video Guide

Week 10 Lesson Plan (Teachers)

⚠️ Week 10 Gotchas & Pitfalls

dbt feels friendly on day one and then silently misbehaves in week two. Use this list to debug faster: every gotcha below is something the chapters demonstrated, but that students tend to forget when they are moving quickly.

Each gotcha starts with what you might assume (the misconception), followed by what actually happens (the reality) and a short fix.

1. --select my_mart does not include upstream models

Introduced in Materializations & Layers and drilled in Practice Exercise 4.

The Misconception

"dbt run --select fct_trips rebuilds everything fct_trips depends on, just like the docs say 'dbt respects dependencies.'"

The Reality

The DAG respects dependencies when dbt decides what to run, but --select names exactly one node. If stg_trips or stg_zones have not been built, fct_trips references a table that does not exist and fails with relation ... does not exist. Worse, if they have been built but are stale, fct_trips builds against stale data.

The Fix: use the graph operator + to include upstream: dbt run --select +fct_trips. Use fct_trips+ for downstream. For an end-to-end build with tests interleaved, dbt build is almost always what you want.

2. Hardcoded table names break the DAG

Introduced in dbt Setup for Azure PostgreSQL and SQL and Jinja Templating.

The Misconception

"from public.raw_trips works in psql, so it works in my dbt model."

The Reality

It works today, and silently breaks tomorrow. Hardcoding public.raw_trips means dbt does not know your model depends on that source. Your model is missing from the lineage graph, dbt run --select +fct_trips will not build upstream, and renaming the source table will not surface a single dbt error. Same trap for hardcoding dev_<your_name>.stg_trips instead of {{ ref('stg_trips') }}: dbt cannot route the correct schema for dev vs prod targets, and your dev work silently writes into the wrong environment.

The Fix: every raw-table reference goes through {{ source('schema', 'table') }}. Every dbt-built table reference goes through {{ ref('model_name') }}. No exceptions, even for quick prototypes.

3. dbt run deploys untested models

Introduced in dbt build: models and tests together.

The Misconception

"I run dbt run for the build, dbt test later when I feel like it. If something is wrong my next dbt test will catch it."

The Reality

dbt run rebuilds every model without checking any tests. A not_null violation in stg_trips does not stop fct_trips from being rebuilt on top of bad data. By the time you get around to dbt test, your dashboards have already been fed yesterday's broken numbers for hours.

The Fix: use dbt build in CI, scheduled jobs, and anywhere numbers are consumed. dbt build interleaves models and tests in DAG order: a failing not_null on stg_trips stops fct_trips from building at all. Keep dbt run / dbt test separately only for local iteration where you want to skip the test phase on a quick compile check.

4. Defaulting every model to table

Introduced in view (default): no storage, always fresh.

The Misconception

"table is fast to query, view is slow because it recomputes. I will materialize everything as table to be safe."

The Reality

Every table materialization runs a full DROP + CREATE TABLE AS SELECT on each dbt run. On a staging model that only exists to rename columns, that cost is pure overhead: downstream models read from the table, but nothing a dashboard queries. On Azure Postgres specifically, the DROP + CREATE churn generates write-ahead-log entries that accumulate until the retention window expires.

The Fix: staging models get view (read-only layer of dependent models). Intermediate models get view. Marts: the ones dashboards and analysts query: get table. The Choosing a materialization table is the short version.

5. profiles.yml silently in ~/.dbt/ instead of the project

Introduced in Configure a project-local profiles.yml.

The Misconception

"dbt found a profile, my model ran, everything is fine."

The Reality

dbt looks for profiles.yml first in the current directory, then in ~/.dbt/profiles.yml. If you have an old profile in ~/.dbt/ from a previous tutorial, dbt finds and uses it: pointing at whatever database that old profile named, which may not be this course's Azure Postgres. You build models into the wrong database without seeing a single error.

The Fix: keep a project-local profiles.yml (the course pattern). Always launch dbt from the project folder. dbt debug prints the path to the profile it actually loaded: read that line, especially after a git clone on a new machine.

6. accepted_values and other generic tests ignore NULLs

Introduced in accepted_values: the column only contains known values.

The Misconception

"I wrote accepted_values: [1, 2, 3, 4, 5, 6] on payment_type. If a row has payment_type = NULL, the test fails."

The Reality

Generic tests like accepted_values, relationships, and (in practice) most user-written singular tests do not match NULL values. A NULL is "neither in nor out of the set," so the row silently passes. On the January 2024 dataset, ~6% of trips have payment_type IS NULL: every one of those would slip past accepted_values without not_null paired on the same column.

The Fix: pair accepted_values with an explicit not_null test on the same column (downgraded to severity: warn if NULLs are a known data quality issue, as the _stg_trips.yml from dbt Tests does). Same rule for relationships and most singular tests: assume NULL is never what you meant to allow unless you opted in.

7. Adding a column to stg_trips does not make it appear in fct_trips

Drilled in Practice Exercise 4.

The Misconception

"I added trip_duration_minutes to stg_trips and ran dbt run. fct_trips will pick it up because it refs stg_trips."

The Reality

fct_trips.sql has an explicit select list. Columns missing from that list are not in the mart, full stop: dbt does not propagate schema changes automatically. Worse, the mart's row count and existing columns look correct, so a query that does not reference the new column returns without error.

The Fix: every schema change is a three-step routine. Add the column to stg_trips.sql. Add it to fct_trips.sql's select list (plus a description: in the YAML so docs stay accurate). Rebuild upstream: dbt run --select +fct_trips. If the mart is consumed by a dashboard, call out the new column in the PR so downstream owners know.

8. dbt docs generate without running models first shows empty columns

Introduced in dbt docs generate and dbt docs serve.

The Misconception

"I added descriptions to my YAML files, so dbt docs generate should produce a docs site with all my models and columns."

The Reality

dbt docs generate introspects the warehouse to discover column types and row counts: the step labelled Building catalog in the command's output. If the models have not been built yet (or if you are on a fresh dev_<your_name> schema where nothing exists), the catalog comes back empty, and the docs site shows model descriptions but no column list or types. dbt prints a small warning about "relations not found in the catalog" that is easy to miss.

The Fix: before dbt docs generate, run dbt run --select +fct_trips (or dbt build if you also want tests) against your target schema. Then regenerate the docs. If you just reset your dev schema, always do a full rebuild before a docs build.

Extra reading


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.