Week 3 - Ingesting and Validating Data
Introduction to Data Ingestion
Ingesting from APIs
Production Error Handling
Reading Multiple File Formats
Data Validation with Pydantic
Writing to Databases
Gotchas & Pitfalls
Practice
Assignment: Build a Validated Ingestion Pipeline
Career relevance: Week 3
Week 3 Glossary
Going Further: Optional Deep Dives
Week 3 Kickoff Slides
History: APIs and Data Transfer
Week 3 - Ingesting and Validating Data
Welcome to Week 3! In Week 2 you refactored a messy script into a modular pipeline with separated config, models, and business logic. This week you connect that pipeline to the real world: pulling data from APIs, reading various file formats, and validating incoming data before it corrupts your pipeline.
By the end of this week, you will have built a robust ingestion system that can handle messy external data sources, validate their structure and content, and safely load them into databases.
Learning goals
- Understand the core challenges of data ingestion: schema mismatches, missing fields, and malformed data
- Ingest data from REST APIs using
requests and handle pagination, authentication, and rate limiting
- Implement comprehensive error handling and logging to track failures and debug production issues
- Read and parse multiple file formats: CSV, JSON, and Parquet
- Use Pydantic for runtime data validation and automatic type coercion
- Write validated data to relational databases (SQLite) using SQL
- Build an end-to-end ingestion pipeline that reads from an API, validates with Pydantic, and writes to a database
Prerequisites
- Completed Week 2: clean modular pipeline structure (config, models, transformations in separate modules), Pytest, and ruff
- Python 3.11+ with
uv for environment management
- Working SQLite (ships with Python; no separate install)
Chapters
- Introduction to Data Ingestion
- Ingesting from APIs
- Production Error Handling
- Reading Multiple File Formats
- Data Validation with Pydantic
- Writing to Databases
- Gotchas & Pitfalls
- Practice
- Assignment
Supplementary
- Career relevance: Week 3: how Week 3's ingestion stack (
requests, retry logic, Pydantic, SQLite upserts) reads on a CV and in a junior data-role interview in the NL market.
- Week 3 Glossary: every ingestion, validation, and database term from this week, grouped by chapter with stable anchors that the chapters link back to.
- Going Further: Optional Deep Dives: longer-form videos, books, declarative-ingestion frameworks (dlt, Airbyte), and tooling that go beyond Week 3's tightly-scoped chapter Extra reading. All optional.
- History: APIs and Data Transfer: how EDI, SOAP/XML, REST/JSON, and Protobuf/gRPC evolved, and what each generation left behind. Optional background reading for the curious.
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0
*https://hackyourfuture.net/*

Built with ❤️ by the HackYourFuture community · Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.