Introduction to SQL for Analytics
6. Building Views in Azure PostgreSQL
The NYC Taxi trips are loaded into PostgreSQL, but the analytics team can't use them yet. The raw table has negative fares, missing zone references, and column names that only make sense if you wrote the ingestion script. Your job: audit the raw data, clean up what you can through views, and hand the team a star schema they can query with confidence.
This assignment focuses on analytics engineering fundamentals: moving from raw data to the data structure used by analysts.
You will audit data quality, design a dimensional model, and implement that model using SQL Views. This layer serves as the "source of truth" for all downstream reporting.
By the end of this assignment, you should be able to:
CREATE VIEW to abstract complexity for end-users.<aside>
โ ๏ธ Run all queries against raw_trips in your own schema (the one your teacher assigned you), not the shared public schema. Views you create must also live in your schema.
</aside>
Before modeling, you must understand the state of your raw_trips data: the same dataset used throughout the week. Write a single SQL script validation_queries.sql that answers the following:
vendor_id, pickup_datetime, and dropoff_datetime?NULL pickup_location_id or dropoff_location_id.fare_amount. Are there negative values?pickup_location_id values in the trips table that do not exist in the raw_zones table?Organize the NYC Taxi data into a star schema to make it easier to query.
vw_fact_trips (the quantitative events)vw_dim_zones (the descriptive attributes)Create a script schema_setup.sql with the following CREATE VIEW statements:
vw_dim_zones: Create a view based on raw_zones. Treat location_id as the primary key.vw_fact_trips: Create a view based on raw_trips.fare_amount is less than 0.pickup_datetime as a TIMESTAMP.vw_dim_zones on location_id. Verify with a test SELECT COUNT(*) FROM vw_fact_trips f JOIN vw_dim_zones d ON f.pickup_location_id = d.location_id after creating it.<aside>
๐ก Why Views? Views save storage and ensure that if the underlying raw_trips data is updated, your star schema reflects those changes instantly without a manual refresh.
</aside>
Create a file named data_dictionary.md. For both views, document:
fare_amount, passenger_count).<aside> ๐ก Using AI to help: Ask an LLM to review your grain statement and check whether your key definitions are consistent with a proper star schema. Then verify its answer against your actual view columns. (โ ๏ธ no real customer data, no PII)
</aside>
Query your new views to answer these specific questions. Save the queries in verification_results.sql with a comment above each one labeling the question it answers.
<aside>
๐ก Borough names and zone names live in vw_dim_zones, not in vw_fact_trips. For borough- or zone-level breakdowns, join on pickup_location_id = location_id.
</aside>
vw_fact_trips? How many rows per borough? What is the most common pickup/dropoff location combination?fare_amount? Which pickup zone had the highest total fare_amount in a single month?trip_distance for each borough?tip_amount? What hour of the day has the highest average tip?For question 1, take a screenshot of your query result showing the per-borough row counts. Save it as assets/borough_count.png.
Create a file named AI_ASSIST.md. While completing tasks 1-4, document one session where you used an LLM to help with a query or a design decision. Fill in these four sections:
<aside> โ ๏ธ Never paste real customer data or PII into an LLM. The NYC Taxi dataset used here is public data, safe to paste.
</aside>
Your submission should contain:
week9-attempt/
โโโ validation_queries.sql (Task 1: validation queries)
โโโ schema_setup.sql (Task 2: CREATE VIEW statements)
โโโ data_dictionary.md (Task 3: grain, keys, measures)
โโโ verification_results.sql (Task 4: verification queries)
โโโ assets/
โ โโโ borough_count.png (Task 4: screenshot of borough query result)
โโโ AI_ASSIST.md (Task 5: LLM session log)
SELECT, FROM, JOIN), consistent indentation.| Task | Points | What the teacher checks |
|---|---|---|
| Task 1: Data Quality Audit | 20 | All four queries run without error; findings match actual data |
| Task 2: Star Schema Views | 30 | Both views exist in your schema; fact view filters negatives; join-readiness test passes |
| Task 3: Data Dictionary | 20 | Grain is precise; keys are correctly identified; measures list is complete |
| Task 4: Verification Queries | 20 | Queries run and return results; screenshot shows borough counts |
| Task 5: AI Assist Log | 10 | All four sections filled; reflection shows real understanding |
Total: 100 points. Passing: 60.
Once you have given the assignment your best attempt (or the deadline is approaching):
main.Next up: Introduction to dbt Core, where you replace hand-written SQL views with version-controlled dbt models that run, test, and document themselves automatically.
The HackYourFuture curriculum is licensed underย CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with โค๏ธ by the HackYourFuture community ยท Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.