Week 9 - SQL for Analytics
Introduction to SQL for Analytics
Joins, CTEs, and Aggregations
3. Data Validation Queries
5. Data Modeling Concepts
6. Building Views in Azure PostgreSQL
7. Gotchas & Pitfalls
8. Practice
9. Assignment
Lesson Plan
3. Data Validation Queries
Data Validation Queries
Content coming soon...
Suggested Topics
- Why validation at the SQL layer matters after ingestion
- Detecting duplicate records: GROUP BY with HAVING COUNT > 1 (e.g., duplicate trip records with the same pickup time and location)
- Finding NULL values and measuring completeness per column (e.g., how many trips have NULL pickuplocationid?)
- Row count validation: comparing source counts to destination counts
- Referential integrity checks: orphaned foreign keys (e.g., trips referencing location_ids not in the zones table)
- Range and format validation: dates out of range, negative fare amounts, unexpected payment types
- Building a reusable validation query library
- Automating validation as post-ingestion checks in your pipeline
- Logging validation results to a data quality table
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0
*https://hackyourfuture.net/*

Built with ❤️ by the HackYourFuture community · Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.
In the previous chapter, 9.2: Joins, CTEs, and Aggregations, you learned how to transform raw data into insights. But there is a massive assumption in that chapter: that the data is actually correct.
In production data engineering, this is a dangerous assumption. Pipes break, APIs change their format, and CSVs often arrive with "garbage" rows. If you build a dashboard on top of bad data, you aren't providing insights—you’re providing "hallucinations." This chapter is about using SQL to build a "firewall" around your data warehouse.