Week 2 - Structuring Data Pipelines

Introduction to Data Pipelines

Configuration & Secrets (.env)

Separation of Concerns (I/O vs Logic)

OOP vs Functional Programming

Dataclasses for Data Objects

Functional Composition

Testing with Pytest

Linting and Formatting with Ruff

Practice

Assignment: Refactoring to a Clean Pipeline

Gotchas & Pitfalls

Lesson Plan

🔐 Configuration & Secrets (.env)

Last chapter, you saw how a data pipeline moves data from sources to storage, transforming it along the way. This chapter covers how to keep your pipelines safe and flexible: how to handle the settings and secrets that your pipeline needs, like API keys.

You'll see how to use environment variables, .env files, and a centralized config module so your credentials don't get hardcoded in your code. By the end, your pipelines will be easier to configure, safer to share, and ready to run in different environments.

Why Hardcoding is a Problem

Imagine this:

API_KEY ="123abc"
DB_PASSWORD ="secret123"

If you commit this to Git, push to GitHub, or share your code, anyone can see your credentials. This is a security disaster! Keys can be leaked, databases compromised, and real jobs broken.

Hardcoding also makes it hard to switch environments (development, staging, production) because you’d have to change the code each time.

<aside> 💡 Even "temporary" hardcoded secrets tend to stay in codebases forever. Treat every secret as if it will be leaked.

</aside>

The solution? Separate configuration from code.

Environment Variables

Environment variables are settings that live outside your code. Your operating system keeps them, and Python can read them with the os module.

import os

api_key = os.environ.get("API_KEY")
print(api_key)

The .env Pattern

.env files are simple text files that hold your secrets locally.

Example .env file:

API_KEY="123abc"
DB_PASSWORD="secret123"

To load these variables into Python, use the python-dotenv library.

# install with: pip install python-dotenv
from dotenv import load_dotenv
import os

load_dotenv()  # reads .env file and loads variables

api_key = os.environ.get("API_KEY")
db_password = os.environ.get("DB_PASSWORD")
print(api_key, db_password)

<aside> 🎬 Terminal Tutorial: Setting Up .env with python-dotenv

</aside>

Advantages of .env:

Centralizing Settings with config.py

Once you have environment variables, you might find yourself calling os.environ.get(...) everywhere. That can get messy. Instead, create a single config.py module:

# config.py
import os
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.environ.get("API_KEY")
DB_PASSWORD = os.environ.get("DB_PASSWORD")
DB_URL = os.environ.get("DB_URL", "sqlite:///default.db")

Now, anywhere in your project:

from config import API_KEY, DB_URL

print("Connecting to database at", DB_URL)

This makes your code cleaner and more maintainable.

<aside> ⌨️ Hands on: Implement a get_config(var_name) function that reads an environment variable and raises a ValueError if it's not set. Never let None slip through silently!

🚀 Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=2&chapter=config_secrets&exercise=w2_config_secrets__safe_config&lang=python

</aside>

<aside> 🤓 Curious Geek: The 12-Factor App

The idea of separating config from code comes from the 12-Factor App methodology, a set of principles written by Heroku engineers in 2011.

Factor III states: "Store config in the environment." This means every setting that changes between environments (dev, staging, prod) should be an environment variable, never hardcoded.

Most modern deployment tools (Docker, Kubernetes, Heroku, Railway) follow this pattern natively!

</aside>

🧠 Knowledge Check

  1. Why is hardcoding secrets in your codebase dangerous, even for small personal projects?
  2. What problem does a .env file solve during local development?
  3. Why is it a good idea to centralize settings in a config.py module?

Extra reading

<aside> 💡 In the wild: The python-dotenv library you just learned is used by thousands of Python projects. Open its README to see the same patterns: load_dotenv(), os.environ, and .env.example files.

</aside>


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.