Teachers

Week 3 - Ingesting and Validating Data

Welcome to Week 3! Now that you can structure clean pipelines, it's time to connect them to the real world. This week is all about data ingestion: pulling data from APIs, reading various file formats, and validating incoming data before it corrupts your pipeline.

By the end of this week, you will have built a robust ingestion system that can handle messy external data sources, validate their structure and content, and safely load them into databases.

Learning goals

Understand the core challenges of data ingestion: schema mismatches, missing fields, and malformed data
Ingest data from REST APIs using requests and handle pagination, authentication, and rate limiting
Read and parse multiple file formats: CSV, JSON, Parquet, and Excel
Use Pydantic for runtime data validation and automatic type coercion
Write validated data to relational databases (SQLite, PostgreSQL) using SQL and ORMs
Implement comprehensive error handling and logging to track failures and debug production issues
Build an end-to-end ingestion pipeline that reads from an API, validates with Pydantic, and writes to a database

Chapters

Teachers

Lesson plan

Back to Data Track

CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.