Introduction to Data Pipelines
Configuration & Secrets (.env)
Separation of Concerns (I/O vs Logic)
Dataclasses for Data Objects
OOP vs Functional Programming
Functional Composition
Testing with Pytest
Practice
Assignment: A Clean Pipeline
Gotchas & Pitfalls
Back to Track
Week 2 - Structuring Data Pipelines
Welcome to Week 2! Now that you know the Python basics, it's time to move from "writing scripts" to "building engineering systems." This week is all about architecture. You will learn how to structure your code so that it is readable, testable, and robust against the messy reality of production data.
By the end of this week, you will have refactored a messy "God Script" into a professional, modular pipeline that separates configuration, data modeling, and business logic.
Learning goals
- Understand the core architecture of a production data pipeline (ETL vs ELT patterns)
- Manage configuration and sensitive secrets using
.env files and a centralized config.py module
- Use Python Dataclasses to define structured data models and move away from generic dictionaries
- Distinguish between Object-Oriented and Functional programming paradigms and know when to apply each in a data context
- Implement Functional Composition to build readable and reusable transformation logic
- Master the principle of Separation of Concerns to decouple I/O operations (files/APIs) from transformation logic
- Write and execute automated unit tests using the Pytest framework
- Refactor messy, stateful code into a clean, modular, and well-tested engineering pipeline
Chapters
- Introduction to Data Pipelines
- Configuration & Secrets (.env)
- Separation of Concerns (I/O vs Logic)
- Dataclasses for Data Objects
- OOP vs Functional Programming
- Functional Composition
- Testing with Pytest
- Practice
- Assignment: Refactoring to a Clean Pipeline
- Gotchas & Pitfalls
Lesson plan
Back to Data Track

*https://hackyourfuture.net/*
Found a mistake or have a suggestion? Let us know in the feedback form.