Week 9 - SQL for Analytics
Introduction to SQL for Analytics
Joins, CTEs, and Aggregations
3. Data Validation Queries
5. Data Modeling Concepts
6. Building Views in Azure PostgreSQL
7. Gotchas & Pitfalls
8. Practice
9. Assignment
Lesson Plan
7. Gotchas & Pitfalls
Gotchas & Pitfalls
Content coming soon...
Suggested Topics
- Accidental cartesian joins: forgetting a JOIN condition on trips and zones and getting millions of rows instead of thousands
- NULL behavior in aggregations: COUNT(*) vs COUNT(column), NULL in GROUP BY
- Confusing WHERE and HAVING: filtering before vs after aggregation
- Ambiguous column names in multi-table joins without aliases (both trips and zones have an id-like column)
- Wrong grain in a fact table: duplicating metrics when joining at different levels
- Overusing SELECT * in analytical queries: performance and readability costs
- Assuming referential integrity exists when it does not: always run a validation query to check whether any trips have pickuplocationid values not present in the zones table
- Not testing views after creation: stale results when underlying data changes
- Loading too much data: the course uses 1 month (January 2024, ~57K rows) of green taxi data; do not load the full NYC taxi dataset (millions of rows per month) to avoid filling up Azure PostgreSQL storage
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0
*https://hackyourfuture.net/*

Built with ❤️ by the HackYourFuture community · Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.