Introduction to Containers and CI/CD
Week 5 Assignment: Containerize and Ship a Data Pipeline
Your pipeline is only as reliable as its dependencies. A pipeline that runs today can break tomorrow if a library updates or a teammate installs a different version. Dependency management solves that by making your Python environment reproducible.
This chapter compares requirements.txt and uv, then shows how to manage dependencies with either option. In Week 1 you learned the basics of virtual environments, package installs, and lock files. See: Python Setup
In Week 1 you saw both approaches at a high level. Here we go deeper because this choice matters for Docker and CI/CD as well. Both approaches are valid, but they solve reproducibility at different levels:
A package manager decides how your project installs external libraries. In this chapter, the practical choice is between the classic pip plus requirements.txt workflow and the modern uv workflow with pyproject.toml and uv.lock.
pyproject.toml workflows, lock files built in, stronger reproducibility for transitive dependenciesFor this track, uv is the recommended route because uv.lock pins the full dependency tree, including upstream dependencies that your direct packages pull in.
<aside>
💡 Pick one workflow and use it consistently across local dev, CI, and Docker. If you are starting fresh, prefer uv.
</aside>
The same problem exists in every ecosystem, and the solution always looks similar.
<aside>
📘 Core program connection: In the Core program with JavaScript you used npm install, package.json, and package-lock.json to manage project dependencies. In the Data Track you solve the same problem in Python with requirements.txt or with pyproject.toml plus uv.lock. The goal is the same in both tracks: declare what your project needs, lock exact versions, and make installs reproducible across machines and CI. Refresh the Core program chapter here: Core Program - Package Managers
</aside>
This is the simplest and most portable approach. You list your direct packages and pin versions.
pandas==2.2.1
requests==2.31.0
pydantic==2.6.1
Then install with pip inside a virtual environment:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
This works well, but it does not give you the same built-in lock behavior for the full dependency tree that uv.lock provides.
Pinning your direct dependencies is a good start, but it does not pin their dependencies. You might list requests==2.31.0, but requests depends on urllib3. If urllib3 releases a breaking change, pip can pull in a newer version the next time someone runs pip install, even though your requirements.txt did not change.
# requirements.txt contains:
# requests==2.31.0
#
# urllib3 is not listed, so pip resolves a version at install time
import requests
response = requests.get("<https://api.example.com/data>")
Two teammates running pip install -r requirements.txt a week apart can end up with different environments. A broken CI run with no code change is often the first sign that this is happening.
<aside> ⚠️ Pinning top-level packages controls what you install directly. It does not fully control what those packages install underneath.
</aside>
uv uses pyproject.toml as the source of truth and writes exact versions, including transitive dependencies, to uv.lock.
[project]
name = "weather-pipeline"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"pandas==2.2.1",
"requests==2.31.0",
]
# Install dependencies pinned in uv.lock
uv sync
# Run a command inside the managed environment
uv run python -m src.pipeline
<aside>
💡 Commit uv.lock. It is the record of the exact versions your CI and teammates should use.
</aside>
This is the main reason uv is recommended in this track: you get a faster workflow and stronger guarantees that CI, your laptop, and production install the same dependency graph.
The theory only sticks if you produce a lock file and watch --frozen catch drift. In a fresh directory, scaffold a project and add one dependency:
uv init weather-pipeline-lockdemo
cd weather-pipeline-lockdemo
uv add "pandas==2.2.1"
Open uv.lock. Notice it pins not just pandas but every transitive dependency (numpy, python-dateutil, pytz, and more) with exact versions. Your direct dependencies live in pyproject.toml; the full resolved graph lives in uv.lock.
Now simulate a CI install, which should use the lock file exactly:
uv sync --frozen
To see what --frozen protects you against, edit pyproject.toml and bump pandas to 2.2.2 without running uv lock. Then run uv sync --frozen again and read the error. That error is what you want CI to throw: it means somebody changed dependencies without committing an updated lock file, and the next install would silently diverge from what was tested.
Refresh the lock file on purpose with uv lock, confirm uv sync --frozen works again, and commit both files together.
<aside>
⚠️ Plain uv sync (without --frozen) will happily re-resolve dependencies if pyproject.toml and uv.lock disagree. That is convenient locally but exactly the behavior you do not want in CI or Docker. Always add --frozen there.
</aside>
Use requirements.txt when:
Use uv when:
For this track, the recommendation is:
requirements.txtuv for new work you controluv sync --frozen in Docker and CI so installs follow the committed lock file exactlyKeep production dependencies separate from development tools like linters and tests. This keeps your container images smaller and your CI runs faster.
requirements.txt approach:
requirements.txt for runtimerequirements-dev.txt for dev toolsuv approach:
[project.optional-dependencies]
dev = [
"pytest==8.2.0",
"ruff==0.5.1",
]
<aside> ⚠️ If you install dev tools in production containers, you increase image size and risk extra vulnerabilities.
</aside>
A reproducible CI run means the same commit installs the same dependency set every time the pipeline runs. CI is your safety check before a deploy, and the signal gets noisy fast when resolvers pick different transitive packages between runs:
uv sync --frozen is the guardrail: it refuses to install if pyproject.toml and uv.lock have drifted apart, which is exactly the moment you want CI to fail loudly.
The concept of deterministic builds applies across every language and toolchain.
<aside> 🤓 Curious Geek: Lock files became standard
JavaScript added package-lock.json in 2017, and Python ecosystems followed with tools like Poetry and uv. The goal is always the same: deterministic builds.
</aside>
requirements.txt and one advantage of uv.uv the recommended route in this track?In the next chapter you bake these pinned dependencies into a Dockerfile, so the image ships with exactly the versions you locked here.