Week 2 - Structuring Data Pipelines
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0
*https://hackyourfuture.net/*

Built with ❤️ by the HackYourFuture community · Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.
Introduction to Data Pipelines
Configuration & Secrets (.env)
Separation of Concerns (I/O vs Logic)
OOP vs Functional Programming
Dataclasses for Data Objects
Functional Composition
Testing with Pytest
Linting and Formatting with Ruff
Practice
Gotchas & Pitfalls
Assignment: Refactoring to a Clean Pipeline
Week 2 - Structuring Data Pipelines
Welcome to Week 2! Now that you know the Python basics, it's time to move from "writing scripts" to "building engineering systems." This week is all about architecture. You will learn how to structure your code so that it is readable, testable, and robust against the messy reality of production data.
By the end of this week, you will have refactored a messy "god script" into a professional, modular pipeline that survives missing files, malformed rows, and missing config: configuration, data modeling, and business logic each live in their own module.
Learning goals
- Understand the core architecture of a production data pipeline (ETL vs ELT patterns)
- Manage configuration and sensitive secrets using .env files and a centralized config.py module
- Use Python Dataclasses to define structured data models and move away from generic dictionaries
- Distinguish between Object-Oriented and Functional programming paradigms and know when to apply each in a data context
- Apply Functional Composition to build readable and reusable transformation logic
- Apply Separation of Concerns to decouple I/O operations (files/APIs) from transformation logic
- Write and execute automated unit tests using the Pytest framework
- Lint and format Python code using ruff, and configure format-on-save in VS Code
- Refactor messy, stateful code into a clean, modular, and well-tested engineering pipeline
Chapters
- Introduction to Data Pipelines
- Configuration & Secrets
- Separation of Concerns
- OOP vs Functional Programming
- Dataclasses for Data Objects
- Functional Composition
- Testing with Pytest
- Linting and Formatting with Ruff
- Practice
- Gotchas & Pitfalls
- Assignment
Supplementary
- Glossary: every term introduced this week, in chapter order, with stable anchors.
- Going Further: optional deeper resources: books, full courses, advanced videos.
- Career Relevance: how this week's skills show up in NL data postings.
- Teachers (GitHub only, not on Notion)
Lesson plan: live-quiz answer cribs, workshop reference solutions, and the assignment rubric. Deliberately excluded from scripts/notion_mapping.json so students cannot read the answer key.