Going Further: Optional Deep Dives

This page is optional. Nothing here is required for Week 10's learning goals or the assignment. Use it after you finish the week if you want to keep learning about dbt, analytics engineering, or the broader modern data stack.

The resources below are organized by how much time they take and how they relate to Week 10. Pick what interests you. Do not try to work through all of it during the week itself.

Full courses and tutorials

Large structured resources that take hours to days to work through. Pick one if you want to reinforce this week's material with guided exercises on a different dataset or warehouse.

dbt Fundamentals (free course, dbt Labs): The official self-paced video course covering models, tests, documentation, and lineage. Uses BigQuery, but every concept transfers to the dbt Core + Azure PostgreSQL stack you learned this week.
DataTalks.Club DE Zoomcamp - Module 4: Analytics Engineering: A free 4-week module covering analytics engineering end-to-end on NYC taxi data, the same dataset you used this week. Includes video lectures, readings, and a homework assignment. Do not try to finish it during Week 10; budget for a separate week after the course.
DataTalks.Club DE Zoomcamp - Module 4 intro: The short intro README if you want to skim the module outline first before committing.
Start Data Engineering - dbt Tutorial: End-to-end walkthrough from install to docs, aimed at data engineers who already know SQL.

Deep dives on specific topics

Longer-form reading that goes beyond what the Week 10 chapters introduced.

How we structure our dbt projects (dbt Labs): The industry-standard folder and naming conventions for dbt projects. Read this once you are comfortable with the basic folder layout from Week 10.
dbt-utils package: A collection of community macros covering surrogate keys, pivots, date spines, and common test patterns. Most real dbt projects install this on day one.
dbt packages hub: Browse the full catalog of community packages for specific data sources and patterns.
dbt Semantic Layer overview: Metrics-as-code, the next layer beyond dbt models. Only relevant once you have a mature dbt project with multiple BI consumers.
SQLMesh: A newer alternative to dbt focused on virtual data environments and column-level lineage. Worth knowing about but not a replacement for learning dbt first.
dbt Fusion engine announcement (May 2025): The Rust-based rewrite of the dbt engine. See the History chapter for context on what this means for dbt Core.

Concepts covered later in the track

Two dbt features sit naturally next to the Week 10 material but need an environment the Week 10 dataset cannot provide. They are introduced and practiced in later weeks of the Data Track, where the scale and the schedule make them meaningful. Short previews here so you recognize the names if you run into them before then.

Incremental models

The fourth materialization (alongside view, table, and ephemeral). An incremental model rebuilds only the new rows instead of rewriting the whole table on every run. The pattern is mentioned in Chapter 4 and practiced later in the track on a 100M-row slice where a full table rebuild is no longer cheap.

The graduation rule is worth remembering even without the code: start every mart as a table. Switch to incremental only when the full rebuild takes longer than the freshness your consumers need: roughly, when the dataset crosses tens of millions of rows and the pipeline runs often enough that full rebuilds stack up.

<aside> 🤓 Curious Geek: why Week 10 uses one month and not the full 3 billion rows

Relocated from Docs & Extras, where the scale context is tangential to the chapter's docs/snapshot focus.

The TLC's public trip-record dataset is 3+ billion rows / ~267 GB of CSVs covering 2009 through today: yellow, green, and for-hire vehicles. At that scale a table rebuild takes tens of minutes on a typical warehouse and incremental stops being optional. We pinned Week 10 to January 2024 green taxi only (~57K rows) so every dbt run finishes in seconds. In a real project you start with one month for development, lock the logic, then switch to incremental and backfill. Two years of yellow taxi (~80-100M rows) is the rough threshold.

</aside>

Reference material if you want to read ahead:

Configure incremental models (dbt docs): the is_incremental() macro, {{ this }}, unique_key, and the merge / delete+insert / insert_overwrite strategies.
Incremental strategies explained (dbt docs): how each strategy handles updates, deletes, and late-arriving rows; the classic > vs >= bug that silently double-counts.

Source freshness

dbt has a freshness: config you attach to a source table: "the most recent row must have been loaded within the last N hours, otherwise warn or fail." Running dbt source freshness before the rest of the pipeline catches a stalled ingestion job before downstream models silently rebuild against stale data.

The feature only earns its keep when the ingestion is supposed to run on a schedule and sometimes does not. Week 10's January 2024 slice is a static snapshot, so dbt source freshness would always error here: not because anything broke, but because the dataset is permanently old. Freshness is practiced later in the track, where your pipeline first has a real schedule (ingestion → dbt source freshness → dbt build). When the ingestion step fails silently (network blip, auth expired, rate-limited), freshness is the gate that stops the pipeline before the transform ships yesterday's numbers as today's.

Source freshness config (dbt docs): reference for loaded_at_field, warn_after, error_after.
Test source freshness (dbt docs): short writeup of the common patterns teams use it for.

Ephemeral materialization

dbt's fifth materialization (view / table / incremental / snapshot are the four you meet in the track). An ephemeral model is never created in the database: dbt inlines its SQL as a CTE inside every downstream model that references it. The realistic case is shared helper logic you do not want analysts to discover or query directly.

-- models/intermediate/int_trips_cleaned.sql
{{ config(materialized='ephemeral') }}

select * from {{ ref('stg_trips') }}
where fare_amount > 0 and trip_distance > 0

Why most projects skip it. You cannot select * from an ephemeral model when debugging. Errors surface in the consuming model rather than at the ephemeral source. The inlined SQL bloats compiled queries on any project with more than a handful of refs. A view gets you the same "no storage cost" property with none of these drawbacks. Reach for ephemeral only when hiding the relation from the schema is the whole point.

Ephemeral materialization (dbt docs): config options and the full list of caveats.

Developer tooling

dbt Power User (VS Code extension)

The dbt Power User extension turns VS Code into an IDE that understands your dbt project. Not a dbt feature, not required, not part of the Week 10 assignment: but most professional dbt Core users install it on day one. It is the closest free equivalent to the paid dbt Cloud IDE, and if the company you join runs dbt Core (most do) this is probably the setup you will see on a senior engineer's screen.

Install and configure. Install dbt Power User by Innoverio from the VS Code Extensions pane, then open the folder containing your nyc_taxi/ project. The extension auto-detects dbt_project.yml and profiles.yml. If auto-detect fails, open the command palette and run dbt Power User: Setup Extension, pasting the path from which dbt (typically ~/.local/bin/dbt if you used uv tool install).

The three features you will use every day:

Autocomplete and jump-to-definition for ref() / source(). Type {{ ref(' inside a .sql file and pick from a dropdown of every model in the project. Cmd/Ctrl+Click on a model name jumps to its source. Eliminates the typo'd-ref bug class.
Compile-on-save. Save a .sql file and the compiled output updates in a side pane. Split your editor vertically, watch Jinja expand into plain SQL as you type. The single most useful feature when learning Jinja, and the fastest debugging loop for templating problems.
Inline SQL execution. Select a chunk of compiled SQL, hit dbt Power User: Execute SQL from the command palette, and results appear below the editor. No psql tab-switching. Two keystrokes between "I changed a where clause" and "row count confirmed."

Also worth knowing but not daily-driver: an interactive lineage graph in the sidebar (live version of dbt docs serve); inline SQL validation via sqlfluff that underlines syntax errors and unused CTEs; column-level lineage in the paid Power User Cloud tier for impact analysis on large projects; a snippet library for dbt-ref / dbt-source / dbt-config-table expansions. See the extension documentation for the full feature list.

When to turn it off. Disable the extension temporarily when debugging dbt run failures: the background compile-on-save can mask whether a failure is at parse, compile, or run time. Reproduce with dbt --debug run --select <model> in the terminal to get the raw output.

SQL formatters: sqlfmt and sqlfluff

Week 10 deliberately skips auto-formatting so you learn to read and write dbt's SQL + Jinja dialect by hand. Once the patterns are second-nature, a formatter saves bikeshedding in code review. The two common choices:

sqlfmt: opinionated, zero-config, Jinja-aware formatter for SQL. No style arguments, no dialect config: run sqlfmt models/ and it is done. The closest equivalent to black for SQL.
sqlfluff: configurable linter and formatter with Jinja support. More power, more setup: a .sqlfluff with dialect = postgres and templater = dbt is the starting point, and rules can be tuned per project.

Recommendation: start with sqlfmt if you want a formatter at all, and wire it into pre-commit in a later week when the rest of the project has a CI story. Not a Week 10 concern.

Videos

Individual zoomcamp videos paired with each Week 10 chapter are linked from the chapter itself (🎬 callout after Extra reading). The full module playlist lives under Full courses and tutorials above.

Community

The dbt Community Slack: the single best resource for practitioners

<aside> ❗ The one link to bookmark from this page. If you remember nothing else, remember the dbt Community Slack. Over 50,000 analytics engineers are active there, channels exist for every adapter (#db-postgres, #db-bigquery, #db-snowflake), every major package, every common use case (#advice-dbt-for-beginners, #advice-dbt-help, #learn-on-demand). Questions get answered in minutes, not days. It is free, it is welcoming, and it is where dbt Labs engineers and community maintainers hang out.

</aside>

Start with #advice-dbt-for-beginners when you have a "why doesn't this work?" question, and #learn-on-demand when you want to share what you are building and get feedback. Search first: most beginner questions have been answered multiple times, but never hesitate to post a fresh one.

Other community resources

Coalesce conference recordings: Talks from the annual dbt conference. A good source for how real companies structure their dbt projects.
dbt Discourse: Longer-form Q&A and discussion, good for searching specific error messages and deeper architectural debates.

Books

The Data Warehouse Toolkit (Kimball): The classic reference for dimensional modeling. Every dbt mart layer you build is a descendant of the patterns in this book.
Fundamentals of Data Engineering (Reis and Housley, 2022): A broader book on the modern data engineering lifecycle. Chapters 8 and 9 cover transformations and serving, which is where dbt fits.

The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.