Week 2 - Structuring Data Pipelines
Introduction to Data Pipelines
Configuration & Secrets (.env)
Separation of Concerns (I/O vs Logic)
Linting and Formatting with Ruff
Assignment: Refactoring to a Clean Pipeline
Last chapter, you saw how a data pipeline moves data from sources to storage, transforming it along the way. This chapter covers how to keep your pipelines safe and flexible: how to handle the settings and secrets that your pipeline needs, like API keys.
You'll see how to use environment variables, .env files, and a centralized config module so your credentials don't get hardcoded in your code. By the end, your pipelines will be easier to configure, safer to share, and ready to run in different environments.
Imagine this:
API_KEY = "123abc"
DB_PASSWORD = "secret123"
If you commit this to Git, push to GitHub, or share your code, anyone can see your credentials. This is a security disaster! Keys can be leaked, databases compromised, and real jobs broken.
Hardcoding also makes it hard to switch environments (development, staging, production) because you'd have to change the code each time.
<aside> ๐ก Even "temporary" hardcoded secrets tend to stay in codebases forever. Treat every secret as if it will be leaked.
</aside>
The same caution applies when you ask an LLM for help.
<aside>
โ ๏ธ Using AI to help: Never paste your real .env contents, API keys, or database passwords into ChatGPT, Claude, or any LLM (โ ๏ธ no PII or production credentials!). When you need help debugging config code, replace real values with API_KEY=PLACEHOLDER style fakes first.
</aside>
The solution? Separate configuration from code.
Environment variables are settings that live outside your code. Your operating system keeps them, and Python can read them with the os module.
import os
api_key = os.environ.get("API_KEY")
print(api_key)
.env Pattern.env files are simple text files that hold your secrets locally.
Example .env file:
API_KEY="123abc"
DB_PASSWORD="secret123"
To load these variables into Python, use the python-dotenv library.
# install with: pip install python-dotenv
from dotenv import load_dotenv
import os
load_dotenv() # reads .env file and loads variables
api_key = os.environ.get("API_KEY")
db_password = os.environ.get("DB_PASSWORD")
print(api_key, db_password)
<aside> ๐ฌ Terminal Tutorial: Setting Up .env with python-dotenv
</aside>
Advantages of .env:
.env file..env to .gitignore:.env to .gitignore, Git will never include your .env file when you commit or push code. Your code works normally, but your secrets stay local and private.Once you have environment variables, you might find yourself calling os.environ.get(...) everywhere. That can get messy. Instead, create a single config.py module. Save the following as config.py in your project root:
# config.py
import os
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.environ.get("API_KEY")
DB_PASSWORD = os.environ.get("DB_PASSWORD")
DB_URL = os.environ.get("DB_URL", "sqlite:///default.db")
Now, from any other file in the same project (with config.py and .env present alongside it):
<!-- runner:expect-fail -->
from config import API_KEY, DB_URL
print("Connecting to database at", DB_URL)
This makes your code cleaner and more maintainable.
<aside>
๐ก Beyond .env: production secrets live in a vault. A .env file is the right pattern for local development. In production, you store secrets in a managed vault like Azure Key Vault, not on disk. The vault enforces access policies, rotation, and audit logging. Week 12 covers the migration from .env to Key Vault: same os.environ.get(...) interface, different source of truth.
</aside>
For now, focus on the local pattern. The first habit is making missing values fail loudly.
<aside>
โจ๏ธ Hands on: Implement a get_config(var_name) function that reads an environment variable and raises a ValueError if it's not set. Never let None slip through silently!
๐ Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=2&chapter=config_secrets&exercise=w2_config_secrets__safe_config&lang=python
</aside>
The pattern itself isn't new: it has a name and a manifesto.
<aside> ๐ค Curious Geek: The 12-Factor App
The idea of separating config from code comes from the 12-Factor App methodology, a set of principles written by Heroku engineers in 2011.
Factor III states: "Store config in the environment." This means every setting that changes between environments (dev, staging, prod) should be an environment variable, never hardcoded.
Most modern deployment tools (Docker, Kubernetes, Heroku, Railway) follow this pattern natively!
</aside>
With the pattern in hand, the next step is to apply it.
<aside> ๐ Practice: Apply this pattern in Practice Exercise 1: Move Secrets to .env.
</aside>
.env file solve during local development?config.py module?.env not the right pattern for production secrets, and what replaces it?.env files in Python<aside>
๐ก In the wild: The python-dotenv library you just learned is used by thousands of Python projects. Open its README to see the same patterns: load_dotenv(), os.environ, and .env.example files.
</aside>
Next up: Separation of Concerns, where you split the "god function" into a thin I/O layer and pure logic functions, so your business rules can be tested without touching the filesystem.
The HackYourFuture curriculum is licensed underย CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with โค๏ธ by the HackYourFuture community ยท Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.