Week 10 - SQL Transformations with dbt

Introduction to dbt Core

dbt Setup for Azure PostgreSQL

SQL and Jinja Templating

Materializations & Layers

dbt Tests

Docs & Extras

Practice

Week 10 Assignment: Build a Daily Borough Stats Mart with dbt

Week 10 Gotchas & Pitfalls

Week 10 Lesson Plan (Teachers)

Career relevance: Week 10 in the NL data job market

Glossary

Going Further: Optional Deep Dives

Going Further: Optional Deep Dives

This page is optional. Nothing here is required for Week 10's learning goals or the assignment. Use it after you finish the week if you want to keep learning about dbt, analytics engineering, or the broader modern data stack.

The resources below are organized by how much time they take and how they relate to Week 10. Pick what interests you. Do not try to work through all of it during the week itself.

Full courses and tutorials

Large structured resources that take hours to days to work through. Pick one if you want to reinforce this week's material with guided exercises on a different dataset or warehouse.

Deep dives on specific topics

Longer-form reading that goes beyond what the Week 10 chapters introduced.

Concepts covered later in the track

Two dbt features sit naturally next to the Week 10 material but need an environment the Week 10 dataset cannot provide. They are introduced and practiced in later weeks of the Data Track, where the scale and the schedule make them meaningful. Short previews here so you recognize the names if you run into them before then.

Incremental models

The fourth materialization (alongside view, table, and ephemeral). An incremental model rebuilds only the new rows instead of rewriting the whole table on every run. The pattern is mentioned in Chapter 4 and practiced later in the track on a 100M-row slice where a full table rebuild is no longer cheap.

The graduation rule is worth remembering even without the code: start every mart as a table. Switch to incremental only when the full rebuild takes longer than the freshness your consumers need: roughly, when the dataset crosses tens of millions of rows and the pipeline runs often enough that full rebuilds stack up.

<aside> 🤓 Curious Geek: why Week 10 uses one month and not the full 3 billion rows

Relocated from Docs & Extras, where the scale context is tangential to the chapter's docs/snapshot focus.

The TLC's public trip-record dataset is 3+ billion rows / ~267 GB of CSVs covering 2009 through today: yellow, green, and for-hire vehicles. At that scale a table rebuild takes tens of minutes on a typical warehouse and incremental stops being optional. We pinned Week 10 to January 2024 green taxi only (~57K rows) so every dbt run finishes in seconds. In a real project you start with one month for development, lock the logic, then switch to incremental and backfill. Two years of yellow taxi (~80-100M rows) is the rough threshold.

</aside>

Reference material if you want to read ahead:

Source freshness

dbt has a freshness: config you attach to a source table: "the most recent row must have been loaded within the last N hours, otherwise warn or fail." Running dbt source freshness before the rest of the pipeline catches a stalled ingestion job before downstream models silently rebuild against stale data.

The feature only earns its keep when the ingestion is supposed to run on a schedule and sometimes does not. Week 10's January 2024 slice is a static snapshot, so dbt source freshness would always error here: not because anything broke, but because the dataset is permanently old. Freshness is practiced later in the track, where your pipeline first has a real schedule (ingestion → dbt source freshnessdbt build). When the ingestion step fails silently (network blip, auth expired, rate-limited), freshness is the gate that stops the pipeline before the transform ships yesterday's numbers as today's.

Ephemeral materialization

dbt's fifth materialization (view / table / incremental / snapshot are the four you meet in the track). An ephemeral model is never created in the database: dbt inlines its SQL as a CTE inside every downstream model that references it. The realistic case is shared helper logic you do not want analysts to discover or query directly.

-- models/intermediate/int_trips_cleaned.sql
{{ config(materialized='ephemeral') }}

select * from {{ ref('stg_trips') }}
where fare_amount > 0 and trip_distance > 0

Why most projects skip it. You cannot select * from an ephemeral model when debugging. Errors surface in the consuming model rather than at the ephemeral source. The inlined SQL bloats compiled queries on any project with more than a handful of refs. A view gets you the same "no storage cost" property with none of these drawbacks. Reach for ephemeral only when hiding the relation from the schema is the whole point.

Developer tooling

dbt Power User (VS Code extension)

The dbt Power User extension turns VS Code into an IDE that understands your dbt project. Not a dbt feature, not required, not part of the Week 10 assignment: but most professional dbt Core users install it on day one. It is the closest free equivalent to the paid dbt Cloud IDE, and if the company you join runs dbt Core (most do) this is probably the setup you will see on a senior engineer's screen.

Install and configure. Install dbt Power User by Innoverio from the VS Code Extensions pane, then open the folder containing your nyc_taxi/ project. The extension auto-detects dbt_project.yml and profiles.yml. If auto-detect fails, open the command palette and run dbt Power User: Setup Extension, pasting the path from which dbt (typically ~/.local/bin/dbt if you used uv tool install).

The three features you will use every day:

Also worth knowing but not daily-driver: an interactive lineage graph in the sidebar (live version of dbt docs serve); inline SQL validation via sqlfluff that underlines syntax errors and unused CTEs; column-level lineage in the paid Power User Cloud tier for impact analysis on large projects; a snippet library for dbt-ref / dbt-source / dbt-config-table expansions. See the extension documentation for the full feature list.

When to turn it off. Disable the extension temporarily when debugging dbt run failures: the background compile-on-save can mask whether a failure is at parse, compile, or run time. Reproduce with dbt --debug run --select <model> in the terminal to get the raw output.

SQL formatters: sqlfmt and sqlfluff

Week 10 deliberately skips auto-formatting so you learn to read and write dbt's SQL + Jinja dialect by hand. Once the patterns are second-nature, a formatter saves bikeshedding in code review. The two common choices:

Recommendation: start with sqlfmt if you want a formatter at all, and wire it into pre-commit in a later week when the rest of the project has a CI story. Not a Week 10 concern.

Videos

Individual zoomcamp videos paired with each Week 10 chapter are linked from the chapter itself (🎬 callout after Extra reading). The full module playlist lives under Full courses and tutorials above.

Community

The dbt Community Slack: the single best resource for practitioners

<aside> ❗ The one link to bookmark from this page. If you remember nothing else, remember the dbt Community Slack. Over 50,000 analytics engineers are active there, channels exist for every adapter (#db-postgres, #db-bigquery, #db-snowflake), every major package, every common use case (#advice-dbt-for-beginners, #advice-dbt-help, #learn-on-demand). Questions get answered in minutes, not days. It is free, it is welcoming, and it is where dbt Labs engineers and community maintainers hang out.

</aside>

Start with #advice-dbt-for-beginners when you have a "why doesn't this work?" question, and #learn-on-demand when you want to share what you are building and get feedback. Search first: most beginner questions have been answered multiple times, but never hesitate to post a fresh one.

Other community resources

Books


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.