Week 2 - Structuring Data Pipelines

Introduction to Data Pipelines

Configuration & Secrets (.env)

Separation of Concerns (I/O vs Logic)

Dataclasses for Data Objects

OOP vs Functional Programming

Functional Composition

Testing with Pytest

Practice

Assignment: A Clean Pipeline

Gotchas & Pitfalls

Back to Track

4. Dataclasses for Data Objects

Goal: Structure your data, don't just throw dictionaries around.

Concepts to Cover

The Problem with Dicts*: row['price'] fails at runtime if the key is wrong.

The Solution*: @dataclass gives you row.price (checked by editor).

Validation*: Adding __post_init__ to check values (e.g., price > 0).

Methods*: Adding helper methods to your data objects (Encapsulation).

Serialization*: Methods to convert back to dict/JSON (asdict).

Suggested Exercises


CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.