Week 10 - SQL Transformations with dbt
dbt Setup for Azure PostgreSQL
Week 10 Assignment: Build a Daily Borough Stats Mart with dbt
Week 10 Lesson Plan (Teachers)
Career relevance: Week 10 in the NL data job market
Going Further: Optional Deep Dives
This page is optional. Nothing here is required for Week 10's learning goals or the assignment. Use it after you finish the week if you want to keep learning about dbt, analytics engineering, or the broader modern data stack.
The resources below are organized by how much time they take and how they relate to Week 10. Pick what interests you. Do not try to work through all of it during the week itself.
Large structured resources that take hours to days to work through. Pick one if you want to reinforce this week's material with guided exercises on a different dataset or warehouse.
Longer-form reading that goes beyond what the Week 10 chapters introduced.
Two dbt features sit naturally next to the Week 10 material but need an environment the Week 10 dataset cannot provide. They are introduced and practiced in later weeks of the Data Track, where the scale and the schedule make them meaningful. Short previews here so you recognize the names if you run into them before then.
The fourth materialization (alongside view, table, and ephemeral). An incremental model rebuilds only the new rows instead of rewriting the whole table on every run. The pattern is mentioned in Chapter 4 and practiced later in the track on a 100M-row slice where a full table rebuild is no longer cheap.
The graduation rule is worth remembering even without the code: start every mart as a table. Switch to incremental only when the full rebuild takes longer than the freshness your consumers need: roughly, when the dataset crosses tens of millions of rows and the pipeline runs often enough that full rebuilds stack up.
<aside> 🤓 Curious Geek: why Week 10 uses one month and not the full 3 billion rows
Relocated from Docs & Extras, where the scale context is tangential to the chapter's docs/snapshot focus.
The TLC's public trip-record dataset is 3+ billion rows / ~267 GB of CSVs covering 2009 through today: yellow, green, and for-hire vehicles. At that scale a table rebuild takes tens of minutes on a typical warehouse and incremental stops being optional. We pinned Week 10 to January 2024 green taxi only (~57K rows) so every dbt run finishes in seconds. In a real project you start with one month for development, lock the logic, then switch to incremental and backfill. Two years of yellow taxi (~80-100M rows) is the rough threshold.
</aside>
Reference material if you want to read ahead:
is_incremental() macro, {{ this }}, unique_key, and the merge / delete+insert / insert_overwrite strategies.> vs >= bug that silently double-counts.dbt has a freshness: config you attach to a source table: "the most recent row must have been loaded within the last N hours, otherwise warn or fail." Running dbt source freshness before the rest of the pipeline catches a stalled ingestion job before downstream models silently rebuild against stale data.
The feature only earns its keep when the ingestion is supposed to run on a schedule and sometimes does not. Week 10's January 2024 slice is a static snapshot, so dbt source freshness would always error here: not because anything broke, but because the dataset is permanently old. Freshness is practiced later in the track, where your pipeline first has a real schedule (ingestion → dbt source freshness → dbt build). When the ingestion step fails silently (network blip, auth expired, rate-limited), freshness is the gate that stops the pipeline before the transform ships yesterday's numbers as today's.
loaded_at_field, warn_after, error_after.dbt's fifth materialization (view / table / incremental / snapshot are the four you meet in the track). An ephemeral model is never created in the database: dbt inlines its SQL as a CTE inside every downstream model that references it. The realistic case is shared helper logic you do not want analysts to discover or query directly.
-- models/intermediate/int_trips_cleaned.sql
{{ config(materialized='ephemeral') }}
select * from {{ ref('stg_trips') }}
where fare_amount > 0 and trip_distance > 0
Why most projects skip it. You cannot select * from an ephemeral model when debugging. Errors surface in the consuming model rather than at the ephemeral source. The inlined SQL bloats compiled queries on any project with more than a handful of refs. A view gets you the same "no storage cost" property with none of these drawbacks. Reach for ephemeral only when hiding the relation from the schema is the whole point.
The dbt Power User extension turns VS Code into an IDE that understands your dbt project. Not a dbt feature, not required, not part of the Week 10 assignment: but most professional dbt Core users install it on day one. It is the closest free equivalent to the paid dbt Cloud IDE, and if the company you join runs dbt Core (most do) this is probably the setup you will see on a senior engineer's screen.
Install and configure. Install dbt Power User by Innoverio from the VS Code Extensions pane, then open the folder containing your nyc_taxi/ project. The extension auto-detects dbt_project.yml and profiles.yml. If auto-detect fails, open the command palette and run dbt Power User: Setup Extension, pasting the path from which dbt (typically ~/.local/bin/dbt if you used uv tool install).
The three features you will use every day:
ref() / source(). Type {{ ref(' inside a .sql file and pick from a dropdown of every model in the project. Cmd/Ctrl+Click on a model name jumps to its source. Eliminates the typo'd-ref bug class..sql file and the compiled output updates in a side pane. Split your editor vertically, watch Jinja expand into plain SQL as you type. The single most useful feature when learning Jinja, and the fastest debugging loop for templating problems.psql tab-switching. Two keystrokes between "I changed a where clause" and "row count confirmed."Also worth knowing but not daily-driver: an interactive lineage graph in the sidebar (live version of dbt docs serve); inline SQL validation via sqlfluff that underlines syntax errors and unused CTEs; column-level lineage in the paid Power User Cloud tier for impact analysis on large projects; a snippet library for dbt-ref / dbt-source / dbt-config-table expansions. See the extension documentation for the full feature list.
When to turn it off. Disable the extension temporarily when debugging dbt run failures: the background compile-on-save can mask whether a failure is at parse, compile, or run time. Reproduce with dbt --debug run --select <model> in the terminal to get the raw output.
Week 10 deliberately skips auto-formatting so you learn to read and write dbt's SQL + Jinja dialect by hand. Once the patterns are second-nature, a formatter saves bikeshedding in code review. The two common choices:
sqlfmt models/ and it is done. The closest equivalent to black for SQL..sqlfluff with dialect = postgres and templater = dbt is the starting point, and rules can be tuned per project.Recommendation: start with sqlfmt if you want a formatter at all, and wire it into pre-commit in a later week when the rest of the project has a CI story. Not a Week 10 concern.
Individual zoomcamp videos paired with each Week 10 chapter are linked from the chapter itself (🎬 callout after Extra reading). The full module playlist lives under Full courses and tutorials above.
<aside>
❗ The one link to bookmark from this page. If you remember nothing else, remember the dbt Community Slack. Over 50,000 analytics engineers are active there, channels exist for every adapter (#db-postgres, #db-bigquery, #db-snowflake), every major package, every common use case (#advice-dbt-for-beginners, #advice-dbt-help, #learn-on-demand). Questions get answered in minutes, not days. It is free, it is welcoming, and it is where dbt Labs engineers and community maintainers hang out.
</aside>
Start with #advice-dbt-for-beginners when you have a "why doesn't this work?" question, and #learn-on-demand when you want to share what you are building and get feedback. Search first: most beginner questions have been answered multiple times, but never hesitate to post a fresh one.
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with ❤️ by the HackYourFuture community · Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.