Introduction to Data Ingestion
Ingesting from APIs
Reading Multiple File Formats
Pydantic for Data Validation
Writing to Databases
Error Handling and Logging
Practice
Assignment
Gotchas & Pitfalls
Back to Track
Week 3 - Ingesting and Validating Data
Welcome to Week 3! Now that you can structure clean pipelines, it's time to connect them to the real world. This week is all about data ingestion: pulling data from APIs, reading various file formats, and validating incoming data before it corrupts your pipeline.
By the end of this week, you will have built a robust ingestion system that can handle messy external data sources, validate their structure and content, and safely load them into databases.
Learning goals
- Understand the core challenges of data ingestion: schema mismatches, missing fields, and malformed data
- Ingest data from REST APIs using
requests and handle pagination, authentication, and rate limiting
- Read and parse multiple file formats: CSV, JSON, Parquet, and Excel
- Use Pydantic for runtime data validation and automatic type coercion
- Write validated data to relational databases (SQLite, PostgreSQL) using SQL and ORMs
- Implement comprehensive error handling and logging to track failures and debug production issues
- Build an end-to-end ingestion pipeline that reads from an API, validates with Pydantic, and writes to a database
Chapters
- Introduction to Data Ingestion
- Production Error Handling
- Ingesting from APIs
- Reading File Formats (CSV, JSON, Parquet)
- Data Validation with Pydantic
- Writing to Databases
- Practice
- Assignment: Build a Validated Ingestion Pipeline
- Gotchas & Pitfalls
Lesson plan
Back to Data Track

*https://hackyourfuture.net/*
Found a mistake or have a suggestion? Let us know in the feedback form.