Week 2 - Structuring Data Pipelines
Introduction to Data Pipelines
Configuration & Secrets (.env)
Separation of Concerns (I/O vs Logic)
Linting and Formatting with Ruff
Assignment: Refactoring to a Clean Pipeline
Last chapter, you saw how a data pipeline moves data from sources to storage, transforming it along the way. This chapter covers how to keep your pipelines safe and flexible: how to handle the settings and secrets that your pipeline needs, like API keys.
You'll see how to use environment variables, .env files, and a centralized config module so your credentials don't get hardcoded in your code. By the end, your pipelines will be easier to configure, safer to share, and ready to run in different environments.
Imagine this:
API_KEY ="123abc"
DB_PASSWORD ="secret123"
If you commit this to Git, push to GitHub, or share your code, anyone can see your credentials. This is a security disaster! Keys can be leaked, databases compromised, and real jobs broken.
Hardcoding also makes it hard to switch environments (development, staging, production) because you’d have to change the code each time.
<aside> 💡 Even "temporary" hardcoded secrets tend to stay in codebases forever. Treat every secret as if it will be leaked.
</aside>
The solution? Separate configuration from code.
Environment variables are settings that live outside your code. Your operating system keeps them, and Python can read them with the os module.
import os
api_key = os.environ.get("API_KEY")
print(api_key)
.env files are simple text files that hold your secrets locally.
Example .env file:
API_KEY="123abc"
DB_PASSWORD="secret123"
To load these variables into Python, use the python-dotenv library.
# install with: pip install python-dotenv
from dotenv import load_dotenv
import os
load_dotenv() # reads .env file and loads variables
api_key = os.environ.get("API_KEY")
db_password = os.environ.get("DB_PASSWORD")
print(api_key, db_password)
<aside> 🎬 Terminal Tutorial: Setting Up .env with python-dotenv
</aside>
Advantages of .env:
.env to .gitignore:.env to .gitignore, Git will never include your .env file when you commit or push code. Your code works normally, but your secrets stay local and private.Once you have environment variables, you might find yourself calling os.environ.get(...) everywhere. That can get messy. Instead, create a single config.py module:
# config.py
import os
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.environ.get("API_KEY")
DB_PASSWORD = os.environ.get("DB_PASSWORD")
DB_URL = os.environ.get("DB_URL", "sqlite:///default.db")
Now, anywhere in your project:
from config import API_KEY, DB_URL
print("Connecting to database at", DB_URL)
This makes your code cleaner and more maintainable.
<aside>
⌨️ Hands on: Implement a get_config(var_name) function that reads an environment variable and raises a ValueError if it's not set. Never let None slip through silently!
🚀 Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=2&chapter=config_secrets&exercise=w2_config_secrets__safe_config&lang=python
</aside>
<aside> 🤓 Curious Geek: The 12-Factor App
The idea of separating config from code comes from the 12-Factor App methodology, a set of principles written by Heroku engineers in 2011.
Factor III states: "Store config in the environment." This means every setting that changes between environments (dev, staging, prod) should be an environment variable, never hardcoded.
Most modern deployment tools (Docker, Kubernetes, Heroku, Railway) follow this pattern natively!
</aside>
.env file solve during local development?config.py module?.env files in Python<aside>
💡 In the wild: The python-dotenv library you just learned is used by thousands of Python projects. Open its README to see the same patterns: load_dotenv(), os.environ, and .env.example files.
</aside>
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

Found a mistake or have a suggestion? Let us know in the feedback form.