dbt Tests

In Materializations & Layers you built fct_trips, a mart materialized as a table and queried by dashboards. The question every analytics engineer faces next is: how do you know, tomorrow morning, that the numbers are still right?

In Week 9 you wrote validation queries by hand: one-shot SQL that counted duplicates, found NULLs, and checked referential integrity. Those were the right queries; the problem was the workflow. They lived in a notebook or a .sql file somewhere, they ran only when someone remembered to run them, and nothing connected them to the models whose data they checked.

dbt ships four generic tests as YAML declarations next to your models (reusable assertions like not_null on pickup_datetime or unique on location_id), plus support for singular tests (a one-off SQL query that returns the failing rows, like trips where pickup_datetime > dropoff_datetime) and unit tests (mocked-input assertions on transformation logic, like "given payment_type = 1, expect payment_type_label = 'Credit card'"). When you run dbt test, every test runs against the database; a non-zero exit stops downstream jobs. That is how "make sure the mart is still right" becomes part of the pipeline instead of a thing someone forgets.

By the end of this chapter, you should be able to:

Declare the four built-in generic tests (not_null, unique, accepted_values, relationships) in a schema YAML file.
Run dbt test and read failure output, including the failing-row query dbt writes to target/compiled/.
Write a singular test for a business rule that does not fit a generic test.
Read and write a unit test that validates transformation logic against mocked inputs.
Choose between warn and error severity.
Use dbt build to run models and their tests together in DAG order.

The four built-in generic tests

Generic tests are attached to columns in a schema YAML file that lives next to your models. The same file also carries column descriptions (covered in Docs & Extras), so this is the one YAML your models refer to most often.

Create models/staging/_stg_trips.yml:

# models/staging/_stg_trips.yml
version: 2

models:
  - name: stg_trips
    description: Cleaned green taxi trips, one row per trip.
    columns:
      - name: pickup_datetime
        description: When the trip started.
        tests:
          - not_null
      - name: pickup_location_id
        description: TLC zone ID where the trip started.
        tests:
          - not_null
          - relationships:
              to: ref('stg_zones')
              field: location_id
      - name: payment_type
        description: TLC payment code (1-6).
        tests:
          - not_null:
              severity: warn        # see "Test severity" section below
          - accepted_values:
              values: [1, 2, 3, 4, 5, 6]

The tests: key under each column is where dbt finds the generic tests to run. Every test compiles to a SQL query that returns the bad rows; the test passes when the query returns zero rows.

Three of the four built-in tests appear above (not_null, accepted_values, relationships); unique is missing because stg_trips has no single-column primary key: raw_trips does not ship with a trip ID. The unique section below covers the two ways dbt projects handle that.

<aside> 💡 Naming a schema file _stg_trips.yml (with the underscore prefix) sorts it above the .sql model in file explorers. This is convention, not a rule. Some teams use one schema.yml per folder; others use one YAML per model. Pick one style per project and stick with it.

</aside>

`not_null`: the column is never NULL

columns:
  - name: pickup_datetime
    tests:
      - not_null

dbt compiles this to:

select pickup_datetime from {{ ref('stg_trips') }} where pickup_datetime is null

If the query returns any rows, the test fails and dbt prints the row count.

Rule of thumb: attach not_null to every column a downstream model or dashboard joins on or groups by. A NULL in a join key silently drops rows; a NULL in a group-by key creates a phantom bucket.

<aside> ⚠️ Do not blindly apply not_null to every column. Test what should never be NULL (join keys, group-by keys, identifiers), not what happens to be populated today. In the January 2024 data, congestion_surcharge is legitimately NULL for some rows: a not_null test on it would fail every run for a non-bug.

</aside>

`unique`: no value repeats

unique + not_null on the same column is the classic primary-key assertion: the column has a value on every row, and no value appears twice. stg_zones.location_id is the natural fit: each TLC zone ID appears exactly once. Here is the complete _stg_zones.yml:

# models/staging/_stg_zones.yml
version: 2

models:
  - name: stg_zones
    description: One row per TLC taxi zone (265 zones total).
    columns:
      - name: location_id
        description: TLC zone ID.
        tests:
          - unique
          - not_null
      - name: borough
        description: NYC borough (Manhattan, Brooklyn, Queens, Bronx, Staten Island, EWR, Unknown).

What about stg_trips? raw_trips does not ship with a trip ID; there is no single column you can declare unique. Two options:

Add a surrogate key in staging (model change). Edit stg_trips.sql to compute a hash column with dbt_utils.generate_surrogate_key, then test the new column in YAML:

   -- in stg_trips.sql
   select
       ...,
       {{ dbt_utils.generate_surrogate_key(['pickup_datetime', 'dropoff_datetime', 'pickup_location_id', 'fare_amount']) }} as trip_id

   # in _stg_trips.yml
   - name: trip_id
     tests: [unique, not_null]

Picking enough columns to make the combination unique per trip is the hard part: two trips that share pickup time, dropoff time, pickup zone, and fare will still collide, which is why dbt projects typically include 4+ columns in the hash.

Test a compound uniqueness (YAML-only). Use dbt_utils.unique_combination_of_columns over the same column list, without touching stg_trips.sql:

   # in _stg_trips.yml
   models:
     - name: stg_trips
       tests:
         - dbt_utils.unique_combination_of_columns:
             combination_of_columns: [pickup_datetime, dropoff_datetime, pickup_location_id, fare_amount]

<aside> ⚠️ Notice the YAML shape: this test sits under models:[...].tests: (model-level), not under columns:[...].tests: like every test you have seen so far. The reason is that the assertion involves multiple columns at once, so it cannot be attached to a single column. Get the indentation wrong and dbt reports a parse error rather than a test failure: re-check the indent before you blame the test.

</aside>

Both patterns show up in real dbt projects. For this week, the practice exercise uses the compound approach (Option 2) so you do not have to modify stg_trips.sql.

<aside> ⚠️ The compound test on the January 2024 dataset finds 4 real duplicates (rows where every column is identical, including dropoff_datetime and trip_distance: adding more columns to the combination does not eliminate them). That is not a bug in the test or your column choice; it is a genuine data-quality issue in the TLC source. The reference project configures this test with severity: warn, so dbt test emits a WARN 4 line and keeps going rather than failing the build. Treat it as the kind of finding dbt test exists to surface; the fix lives upstream of dbt.

</aside>

Before you run either test, you need the community package that hosts them.

<aside> 💡 Both options require the dbt_utils package: add it to a packages.yml file and run dbt deps before your next dbt test. Package management is covered in Docs & Extras; the practice exercise includes the install step so you do not have to improvise it.

</aside>

`accepted_values`: the column only contains known values

columns:
  - name: payment_type
    tests:
      - accepted_values:
          values: [1, 2, 3, 4, 5, 6]

The TLC payment codes are documented as 1 through 6. Anything outside that set means the source data changed (a new code was added) or the ingestion landed corrupt rows. Either way, you want to know before the dashboard shows an unlabeled bucket.

<aside> ⚠️ accepted_values ignores NULLs: a NULL value is neither in nor out of the set, so it does not fail the test. In the January 2024 data ~6% of rows have payment_type IS NULL. That is why the YAML above pairs accepted_values with not_null: severity: warn on the same column: accepted_values covers "an unexpected code appeared," not_null covers "the code is missing entirely," and warn keeps the existing NULLs from blocking CI while still surfacing the count.

</aside>

Once you have both tests in place, a Jinja trick from SQL and Jinja Templating keeps the allowed-values list in sync with the labelling logic you already wrote.

<aside> 💡 Combine accepted_values with the Jinja {% for %} mapping you wrote in SQL and Jinja Templating: the same dictionary of {1: 'Credit card', 2: 'Cash', ...} becomes the source of truth for both the payment_type_label column and the accepted_values list. When a new code appears, the mapping and the test need to be updated in lockstep.

</aside>

<aside> 💡 Using AI to help: LLMs are strong at generating unit-test given: / expect: blocks for a model once you give them the model SQL and a sentence describing the business rule. Prompt: "Write a dbt unit test in YAML that validates this case logic handles codes 1 through 6 plus a NULL. Model SQL: <paste>." Review every generated row before running dbt test: models sometimes invent columns the source does not have, and silent column invention is a classic unit-test-that-passes-for-the-wrong-reason. ⚠️ Sample rows for the public TLC dataset are safe to paste; for real projects, use synthetic rows in prompts, not real customer data.

</aside>

`relationships`: foreign keys always resolve

columns:
  - name: pickup_location_id
    tests:
      - relationships:
          to: ref('stg_zones')
          field: location_id
          severity: warn

This asserts that every pickup_location_id value in stg_trips exists as a location_id in stg_zones. In Week 9 you wrote this as a join-and-check validation query; relationships is the same check, declarative and wired into the DAG.

When the test fails, dbt tells you how many trip rows have a pickup zone that is not in the lookup table. That is usually a sign of stale raw_zones data or an upstream ingestion bug: worth knowing before dashboard users spot a NULL borough.

<aside> 💡 A note on severity for FK tests. The example above leaves the test at the default error severity, which is the right starting point. Many real projects soften FK tests to severity: warn once they have lived with the data for a while: foreign-key drift between raw_trips and raw_zones is the kind of thing that will eventually happen as new months are ingested (a new zone added by the TLC, a stale raw_zones snapshot), and a single broken row should not block the whole pipeline at 03:00. The Test severity section below shows the syntax: for January 2024 every pickup happens to resolve, so the test passes either way.

</aside>

<aside> 💡 In the wild: The dbt-labs/dbt-utils package is dbt-Labs' own open-source library of generic tests written in this same select bad rows style. Open macros/generic_tests/unique_combination_of_columns.sql to see the test you used in the staging YAML, in production form: it is just a SQL select wrapped in a Jinja {% test %} block, exactly the pattern this chapter teaches.

</aside>

Running tests

dbt test runs every test in the project:

dbt test

You get one pass/fail line per test, plus a summary. On the January 2024 dataset, a fully-configured nyc_taxi project prints something like this:

 1 of 11 PASS accepted_values_stg_trips_payment_type__1__2__3__4__5__6 .......... [PASS in 0.57s]
 2 of 11 PASS assert_pickup_before_dropoff ...................................... [PASS in 0.57s]
 3 of 11 WARN 4 dbt_utils_unique_combination_of_columns_stg_trips_pickup_datetime... [WARN 4 in 0.68s]
 4 of 11 PASS not_null_fct_trips_pickup_datetime ................................ [PASS in 0.57s]
 5 of 11 WARN 3415 not_null_stg_trips_payment_type .............................. [WARN 3415 in 0.52s]
 6 of 11 PASS not_null_stg_trips_pickup_datetime ................................ [PASS in 0.52s]
 7 of 11 PASS not_null_stg_trips_pickup_location_id ............................. [PASS in 0.50s]
 8 of 11 PASS not_null_stg_zones_location_id .................................... [PASS in 0.38s]
 9 of 11 PASS relationships_stg_trips_pickup_location_id__location_id__ref_stg_zones_  [PASS in 0.39s]
10 of 11 PASS unique_stg_zones_location_id ...................................... [PASS in 0.43s]
11 of 11 PASS stg_trips::payment_type_label_maps_known_codes .................... [PASS in 0.63s]
Done. PASS=9 WARN=2 ERROR=0 SKIP=0 NO-OP=0 TOTAL=11

Two real findings are embedded in that output:

WARN 4 on the compound uniqueness test means 4 rows in raw_trips have identical pickup_datetime, dropoff_datetime, pickup_location_id, and fare_amount: a real data-quality issue in the TLC source.
WARN 3415 on not_null_stg_trips_payment_type means ~6% of trips have a NULL payment_type. That is raw-data reality, not a bug in your model; the test is set to severity: warn so it surfaces the count without blocking the pipeline (see Test severity).

ERROR=0 means nothing blocks a dbt build. The warnings are findings the team can triage without halting the pipeline.

Reading a failure

When a test fails, dbt writes the compiled SQL query to target/compiled/<project>/.... Open that file and run the query against your dev schema in psql: the rows it returns are the ones violating the assertion. This is the single most useful debugging technique for failing tests.

dbt test --select stg_trips
# dbt prints the path to the compiled failure query for each failed test. Open it.
# If you would rather discover the path than copy it from the log, list the folder:
ls target/compiled/nyc_taxi/models/staging/_stg_trips.yml/
# then `cat` whichever file matches the failing test name.

Selective testing

Same selector syntax as dbt run. Test one model, one tag, or an upstream/downstream slice:

dbt test --select stg_trips        # tests on stg_trips only
dbt test --select +fct_trips       # tests on fct_trips and all upstream models
dbt test --select tag:critical     # tests tagged "critical"
                                   # (requires tags: ['critical'] on the test in YAML first)

In CI, you run the whole suite. While developing a single model, you run the slice.

Singular tests: business rules that do not fit a generic

Generic tests cover the common patterns. Business rules like "pickup happens before dropoff" or "fare matches the sum of its components" do not. Those go in the tests/ folder as plain .sql files.

A singular test is a select that returns the bad rows. An empty result passes; any returned rows fail the test.

-- tests/assert_pickup_before_dropoff.sql
select
    pickup_datetime,
    dropoff_datetime,
    pickup_location_id
from {{ ref('stg_trips') }}
where pickup_datetime > dropoff_datetime

dbt test runs this alongside the generic tests. When it fails, the compiled query in target/compiled/ already is the "show me the bad rows" query you would write by hand: no further debugging setup needed.

Rule of thumb: reach for a singular test when the assertion involves a relationship between columns (not a property of one column), a time window, or any SQL you could not express by listing values in YAML.

Unit tests: testing transformation logic

Data tests answer "is the data in the warehouse correct right now?" Unit tests answer a different question: "does the transformation logic in the model produce the output I expect, given controlled inputs?" They run against mocked rows, not the live database.

The payment_type_label logic from SQL and Jinja Templating is a good unit-test target. You built it from a Jinja dictionary; the assertion is "for each payment_type code, the label matches the dictionary."

The example below assumes stg_trips selects both payment_type and payment_type_label: added in Practice Exercise 1. If your stg_trips does not yet expose those columns, finish that exercise first or dbt will fail the unit test with a column-not-found error rather than a clean assertion diff.

# models/staging/_stg_trips.yml
unit_tests:
  - name: payment_type_label_maps_known_codes
    model: stg_trips
    given:
      - input: source('nyc_taxi', 'raw_trips')
        rows:
          - {payment_type: 1, pickup_datetime: '2024-01-01 08:00:00', pickup_location_id: 100, fare_amount: 10.0, tip_amount: 2.0, trip_distance: 2.0}
          - {payment_type: 2, pickup_datetime: '2024-01-01 09:00:00', pickup_location_id: 100, fare_amount: 10.0, tip_amount: 0.0, trip_distance: 2.0}
          - {payment_type: 6, pickup_datetime: '2024-01-01 10:00:00', pickup_location_id: 100, fare_amount: 10.0, tip_amount: 0.0, trip_distance: 2.0}
    expect:
      rows:
        - {payment_type: 1, payment_type_label: 'Credit card'}
        - {payment_type: 2, payment_type_label: 'Cash'}
        - {payment_type: 6, payment_type_label: 'Voided trip'}

dbt test --select "stg_trips,test_type:unit" runs the unit tests. The comma in the selector is an intersection, not a list: dbt runs tests that match stg_trips and are unit tests; a space would mean "or" and run a different set. dbt then replaces every {{ source(...) }} and {{ ref(...) }} call in the model with the rows you declared under given:, compiles the model's SQL (renames, type casts, the jinja case that builds payment_type_label), executes it, and compares the output to expect:. So the input keys reference raw_trips columns and the expected keys reference stg_trips columns: payment_type_label only exists after the model's SQL runs. If the mapping logic breaks (say, someone edits the dictionary), the unit test fails without touching your warehouse data.

Columns the model selects but you did not list under given: (for example dropoff_datetime, dropoff_location_id) come through as NULL. Include only the columns your assertion actually depends on: here, payment_type drives the label logic, so the other fields stay minimal.

Rule of thumb: unit tests earn their keep for transformation logic that has branches: case statements, conditional {% if %}, macros with argument-dependent output. For a plain column rename, a unit test repeats what dbt run already proves.

Test severity: `warn` vs `error`

By default, a failing test is an error: dbt test exits non-zero and CI stops. Some tests are better as warnings: assertions about raw data you do not fully control, where a failure should notify the team but not block a deploy. The severity config controls this per test.

columns:
  - name: congestion_surcharge
    tests:
      - not_null:
          severity: warn

With severity: warn, a failure prints a yellow line and dbt test still exits 0. The test still runs, the result still shows up in logs and in dbt docs, but the pipeline keeps going.

Rule of thumb: default to error. Use warn when the test is about raw upstream data you cannot fix without waiting for an ingestion change, and when a dashboard showing slightly-stale data is better than a broken dashboard.

<aside> 💡 This is why congestion_surcharge earns not_null: severity: warn while a hypothetical ehail_fee column (see the not_null section: present in the full TLC schema, dropped by Week 9's ingestion) would earn no not_null test at all. congestion_surcharge is usually populated (most trips through Manhattan charge it), so a NULL count drifting up is worth surfacing in the logs. ehail_fee is almost always NULL (only e-hail dispatched trips charge it), so not_null would fail at every severity level without teaching you anything new.

</aside>

`dbt build`: models and tests together

dbt run builds models. dbt test runs tests. In production you almost always want both: and you want them interleaved in DAG order so a failing test on stg_trips stops dbt from building fct_trips on bad data.

dbt build does exactly that:

dbt build --select +fct_trips

For every selected model, dbt:

Builds the model.
Runs every test attached to that model.
If any test fails, marks downstream models as skipped: they do not run at all.

The alternative is dbt run && dbt test, which builds every model first and only then checks for problems. By the time a not_null failure on stg_trips shows up, fct_trips has already been rebuilt from the bad staging data.

Rule of thumb: use dbt build in CI and scheduled runs. Use dbt run / dbt test separately only when you are iterating locally and want to skip the test phase on a quick change.

dbt Tests

The four built-in generic tests

not_null: the column is never NULL

unique: no value repeats

accepted_values: the column only contains known values

relationships: foreign keys always resolve

Running tests

Reading a failure

Selective testing

Singular tests: business rules that do not fit a generic

Unit tests: testing transformation logic

Test severity: warn vs error

dbt build: models and tests together

Putting it all together

`not_null`: the column is never NULL

`unique`: no value repeats

`accepted_values`: the column only contains known values

`relationships`: foreign keys always resolve

Test severity: `warn` vs `error`

`dbt build`: models and tests together