Introduction to Data Pipelines
Configuration & Secrets (.env)
Separation of Concerns (I/O vs Logic)
Dataclasses for Data Objects
OOP vs Functional Programming
Functional Composition
Testing with Pytest
Practice
Assignment: A Clean Pipeline
Gotchas & Pitfalls
Back to Track
10. Gotchas & Pitfalls
Content coming soon...
Concepts to Cover
- Variables as references: labels vs boxes, implications for large datasets
- Naming conventions for pipelines: the "state" convention (raw, clean, final)
- Integers vs floats: the precision trap, why
float fails for money, using decimal
- Strings: encoding nightmares (UTF-8 vs legacy), common cleaning patterns
- Lists: mutable vs immutable, list comprehensions as the ETL workhorse
- Dictionaries: the "record" type, safe access with
.get(), nested data
- Runtime type checking: data validation with
isinstance()
- Circular references and memory management in pipelines
- Generator expressions vs list comprehensions: memory efficiency
- The
copy module: shallow vs deep copies and when they matter
- Late binding in closures
- Common bugs in error handling: silently catching exceptions

*https://hackyourfuture.net/*
Found a mistake or have a suggestion? Let us know in the feedback form.