Week 10 - SQL Transformations with dbt

Introduction to dbt Core

dbt Setup for Azure PostgreSQL

SQL and Jinja Templating

dbt Tests

Incremental Models, Docs, and Snapshots

Building Your First Mart

Week 10 Gotchas & Pitfalls

Practice

Week 10 Assignment: Build a Daily Borough Stats Mart with dbt

Week 10 - Video Guide

Week 10 Lesson Plan (Teachers)

🎒 Week 10 Assignment: Build a Daily Borough Stats Mart with dbt

The chapter hands-on walked you through building fct_trips: a one-row-per-trip mart: step by step. The practice exercises drilled specific skills (macros, singular tests, debugging refs, propagating column changes) against that same project.

This assignment asks you to build a second, independent dbt project end to end, against the same raw_trips and raw_zones source data, but producing a mart at a different grain: one row per pickup_borough per date. You will redo every step (sources → staging → mart → tests → docs) on your own, without the chapter's line-by-line guidance. The goal is to prove you can take a business question and deliver a versioned, tested, documented dbt project that answers it.

By the end of this assignment, you should be able to:

The business question

Your stakeholder (imagine a product manager for NYC Green Taxi) wants a dashboard showing, for each day in January 2024:

Grain of the mart: one row per (pickup_borough, pickup_date).

<aside> 💡 Do not reuse your Week 10 project folder. Start a fresh nyc_taxi_borough_daily/ project. You will reuse the connection settings from Chapter 2, but the models, tests, and docs are new.

</aside>

Task 1: Initialise a new dbt project

Create a new folder outside your Week 10 project (e.g. ~/projects/nyc_taxi_borough_daily/). Run dbt init nyc_taxi_borough_daily --skip-profile-setup, then add a project-local profiles.yml pointing at the same shared Azure PostgreSQL instance and a personal dev_<your_name>_borough schema.

<aside> ⚠️ Delete the dbt init scaffolding. The generator creates models/example/my_first_dbt_model.sql + my_second_dbt_model.sql + a schema.yml with tests attached. If you leave these in place, dbt build will report FAIL 1 not_null_my_first_dbt_model_id and block your submission's ERROR=0 requirement. Run rm -rf models/example/ and remove the example: block from dbt_project.yml's models: config before you continue.

</aside>

<aside> ⚠️ Do not reuse the dev_<your_name> schema from Week 10 chapter work, so the assignment builds are isolated. A separate schema also makes the reference repo's checkpoint tags irrelevant to your submission.

</aside>

Run dbt debug and confirm All checks passed!.

Task 2: Declare sources

Create models/staging/_sources.yml that declares raw_trips and raw_zones in public. This is identical to Define sources for the Week 9 raw tables: the point is that you write it again from scratch, not copy-paste.

Task 3: Build staging models

Create two staging models:

  1. stg_trips.sql: one row per trip. Include the columns you will need for the mart: pickup_datetime, pickup_location_id, fare_amount, tip_amount, trip_distance, plus a derived tip_pct using your own safe_divide macro (copy the macro from your Week 10 project). Drop rows where pickup_location_id is null or fare_amount < 0.
  2. stg_zones.sql: one row per TLC zone. Expose at least location_id and borough.

Materialize both as view.

Task 4: Build the mart

Create models/marts/fct_daily_borough_stats.sql materialized as table, grain one row per (pickup_borough, pickup_date). Join stg_trips to stg_zones on pickup_location_id, then aggregate. Columns to expose:

Column Type Definition
pickup_borough text Borough from stg_zones.borough
pickup_date date pickup_datetime::date
trip_count bigint count(*)
total_fare numeric sum(fare_amount)
avg_tip_pct numeric avg(tip_pct)
avg_trip_distance numeric avg(trip_distance)

<aside> ⌨️ Sanity check: after dbt run --select +fct_daily_borough_stats, query your mart and confirm Manhattan has the highest trip_count on most days. If it does not, you likely joined on the wrong key or filtered too aggressively in staging.

</aside>

Task 5: Add tests

Attach at least the following to your staging and mart models via YAML schema files:

Install dbt_utils via packages.yml + dbt deps before running the compound uniqueness test.

Run dbt build (not dbt run + dbt test separately) and ensure the summary ends with ERROR=0. Warnings are acceptable.

Task 6: Document the project

Task 7: Answer the business questions

Query your mart to answer the stakeholder's questions. Save the queries and results in reports/answers.md:

  1. Which borough had the highest total total_fare across all of January 2024?
  2. Which day had the highest overall trip_count (sum across all boroughs)?
  3. What was the highest avg_tip_pct seen for any (borough, day) combination, and on which borough/date?
  4. What was the median daily trip_count for Manhattan vs Brooklyn?

Each answer must include: the SQL you ran, the result, and one sentence interpreting it.

Task 8: AI Assist Report

Create AI_ASSIST.md describing:

<aside> ⚠️ Do not paste any real data into an LLM: the NYC TLC dataset is public so sample rows are safe here, but the discipline of scrubbing data is what you must practise.

</aside>

Technical requirements

nyc_taxi_borough_daily/
├── .gitignore                    # must exclude profiles.yml
├── dbt_project.yml
├── packages.yml
├── profiles.yml.example          # commit a sanitized copy (no password)
├── macros/
│   └── safe_divide.sql
├── models/
│   ├── staging/
│   │   ├── _sources.yml
│   │   ├── _stg_trips.yml
│   │   ├── _stg_zones.yml
│   │   ├── stg_trips.sql
│   │   └── stg_zones.sql
│   └── marts/
│       ├── _fct_daily_borough_stats.yml
│       └── fct_daily_borough_stats.sql
├── tests/
│   └── assert_avg_tip_pct_within_bounds.sql
├── docs/
│   └── lineage.png
├── reports/
│   └── answers.md
└── AI_ASSIST.md

Deliverables

Your mentor reviews against a detailed rubric covering correctness, modelling, testing, documentation, reproducibility, business-answer quality, and AI discipline. Ask them for specific feedback on any dimension you want to improve: they will share rubric-aligned notes in the PR review.

Extra reading

Submission

  1. Create a git branch week10/your-name in your submission repo.
  2. Commit your work with a clear message.