Teachers

Week 2 - Structuring Data Pipelines

Welcome to Week 2! Now that you know the Python basics, it's time to move from "writing scripts" to "building engineering systems." This week is all about architecture. You will learn how to structure your code so that it is readable, testable, and robust against the messy reality of production data.

By the end of this week, you will have refactored a messy "God Script" into a professional, modular pipeline that separates configuration, data modeling, and business logic.

Learning goals

Understand the core architecture of a production data pipeline (ETL vs ELT patterns)
Manage configuration and sensitive secrets using .env files and a centralized config.py module
Use Python Dataclasses to define structured data models and move away from generic dictionaries
Distinguish between Object-Oriented and Functional programming paradigms and know when to apply each in a data context
Implement Functional Composition to build readable and reusable transformation logic
Master the principle of Separation of Concerns to decouple I/O operations (files/APIs) from transformation logic
Write and execute automated unit tests using the Pytest framework
Refactor messy, stateful code into a clean, modular, and well-tested engineering pipeline

Chapters

Teachers

Lesson plan

Back to Data Track

CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.