Week 2 - Structuring Data Pipelines
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

*https://hackyourfuture.net/*
Found a mistake or have a suggestion? Let us know in the feedback form.
Introduction to Data Pipelines
Configuration & Secrets (.env)
Separation of Concerns (I/O vs Logic)
OOP vs Functional Programming
Dataclasses for Data Objects
Functional Composition
Testing with Pytest
Linting and Formatting with Ruff
Practice
Assignment: Refactoring to a Clean Pipeline
Gotchas & Pitfalls
Lesson Plan
Week 2 - Structuring Data Pipelines
Welcome to Week 2! Now that you know the Python basics, it's time to move from "writing scripts" to "building engineering systems." This week is all about architecture. You will learn how to structure your code so that it is readable, testable, and robust against the messy reality of production data.
By the end of this week, you will have refactored a messy "god script" into a professional, modular pipeline that separates configuration, data modeling, and business logic.
Learning goals
- Understand the core architecture of a production data pipeline (ETL vs ELT patterns)
- Manage configuration and sensitive secrets using .env files and a centralized config.py module
- Use Python Dataclasses to define structured data models and move away from generic dictionaries
- Distinguish between Object-Oriented and Functional programming paradigms and know when to apply each in a data context
- Implement Functional Composition to build readable and reusable transformation logic
- Master the principle of Separation of Concerns to decouple I/O operations (files/APIs) from transformation logic
- Write and execute automated unit tests using the Pytest framework
- Lint and format Python code using ruff, and configure format-on-save in VS Code
- Refactor messy, stateful code into a clean, modular, and well-tested engineering pipeline
First lesson: Introduction to Data Pipelines
Lesson plan