Week 5 - Containers & CI/CD

Introduction to Containers and CI/CD

Dependency Management

Docker Fundamentals

Azure Container Registry

Python CI Pipeline

Week 5 Gotchas & Pitfalls

Practice

Week 5 Assignment: Containerize and Ship a Data Pipeline

Week 5 Lesson Plan

Dependency Management

Your pipeline is only as reliable as its dependencies. A pipeline that runs today can break tomorrow if a library updates or a teammate installs a different version. Dependency management solves that by making your Python environment reproducible.

This chapter compares requirements.txt and uv, then shows how to manage dependencies with either option. In Week 1 you learned the basics of virtual environments, package installs, and lock files. See: Python Setup

Concepts

requirements.txt vs uv

In Week 1 you saw both approaches at a high level. Here we go deeper because this choice matters for Docker and CI/CD as well. Both approaches are valid, but they solve reproducibility at different levels:

A package manager decides how your project installs external libraries. In this chapter, the practical choice is between the classic pip plus requirements.txt workflow and the modern uv workflow with pyproject.toml and uv.lock.

For this track, uv is the recommended route because uv.lock pins the full dependency tree, including upstream dependencies that your direct packages pull in.

<aside> 💡 Pick one workflow and use it consistently across local dev, CI, and Docker. If you are starting fresh, prefer uv.

</aside>

The same problem exists in every ecosystem, and the solution always looks similar.

<aside> 📘 Core program connection: In the Core program with JavaScript you used npm install, package.json, and package-lock.json to manage project dependencies. In the Data Track you solve the same problem in Python with requirements.txt or with pyproject.toml plus uv.lock. The goal is the same in both tracks: declare what your project needs, lock exact versions, and make installs reproducible across machines and CI. Refresh the Core program chapter here: Core Program - Package Managers

</aside>

Option A: requirements.txt (classic workflow)

This is the simplest and most portable approach. You list your direct packages and pin versions.

pandas==2.2.1
requests==2.31.0
pydantic==2.6.1

Then install with pip inside a virtual environment:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

This works well, but it does not give you the same built-in lock behavior for the full dependency tree that uv.lock provides.

Pinning your direct dependencies is a good start, but it does not pin their dependencies. You might list requests==2.31.0, but requests depends on urllib3. If urllib3 releases a breaking change, pip can pull in a newer version the next time someone runs pip install, even though your requirements.txt did not change.

# requirements.txt contains:
# requests==2.31.0
#
# urllib3 is not listed, so pip resolves a version at install time

import requests

response = requests.get("<https://api.example.com/data>")

Two teammates running pip install -r requirements.txt a week apart can end up with different environments. A broken CI run with no code change is often the first sign that this is happening.

<aside> ⚠️ Pinning top-level packages controls what you install directly. It does not fully control what those packages install underneath.

</aside>

Option B: uv with pyproject.toml and uv.lock

uv uses pyproject.toml as the source of truth and writes exact versions, including transitive dependencies, to uv.lock.

[project]
name = "weather-pipeline"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
  "pandas==2.2.1",
  "requests==2.31.0",
]
# Install dependencies pinned in uv.lock
uv sync

# Run a command inside the managed environment
uv run python -m src.pipeline

<aside> 💡 Commit uv.lock. It is the record of the exact versions your CI and teammates should use.

</aside>

This is the main reason uv is recommended in this track: you get a faster workflow and stronger guarantees that CI, your laptop, and production install the same dependency graph.

⌨️ Hands on: generate a lock file and freeze it

The theory only sticks if you produce a lock file and watch --frozen catch drift. In a fresh directory, scaffold a project and add one dependency:

uv init weather-pipeline-lockdemo
cd weather-pipeline-lockdemo
uv add "pandas==2.2.1"

Open uv.lock. Notice it pins not just pandas but every transitive dependency (numpy, python-dateutil, pytz, and more) with exact versions. Your direct dependencies live in pyproject.toml; the full resolved graph lives in uv.lock.

Now simulate a CI install, which should use the lock file exactly:

uv sync --frozen

To see what --frozen protects you against, edit pyproject.toml and bump pandas to 2.2.2 without running uv lock. Then run uv sync --frozen again and read the error. That error is what you want CI to throw: it means somebody changed dependencies without committing an updated lock file, and the next install would silently diverge from what was tested.

Refresh the lock file on purpose with uv lock, confirm uv sync --frozen works again, and commit both files together.

<aside> ⚠️ Plain uv sync (without --frozen) will happily re-resolve dependencies if pyproject.toml and uv.lock disagree. That is convenient locally but exactly the behavior you do not want in CI or Docker. Always add --frozen there.

</aside>

Which package manager should you choose?

Use requirements.txt when:

Use uv when:

For this track, the recommendation is:

Runtime vs dev dependencies

Keep production dependencies separate from development tools like linters and tests. This keeps your container images smaller and your CI runs faster.

requirements.txt approach:

uv approach:

[project.optional-dependencies]
dev = [
  "pytest==8.2.0",
  "ruff==0.5.1",
]

<aside> ⚠️ If you install dev tools in production containers, you increase image size and risk extra vulnerabilities.

</aside>

What reproducible CI actually means

A reproducible CI run means the same commit installs the same dependency set every time the pipeline runs. CI is your safety check before a deploy, and the signal gets noisy fast when resolvers pick different transitive packages between runs:

uv sync --frozen is the guardrail: it refuses to install if pyproject.toml and uv.lock have drifted apart, which is exactly the moment you want CI to fail loudly.

The concept of deterministic builds applies across every language and toolchain.

<aside> 🤓 Curious Geek: Lock files became standard

JavaScript added package-lock.json in 2017, and Python ecosystems followed with tools like Poetry and uv. The goal is always the same: deterministic builds.

</aside>

Exercises

  1. Explain one advantage of requirements.txt and one advantage of uv.
  2. Add a dev dependency group and describe when you would install it.
  3. Identify one dependency in your pipeline in week 3 or 4 that should be pinned and explain why.

Knowledge Check

  1. What problem does a lock file solve for CI runs?
  2. Why is uv the recommended route in this track?
  3. Why should dev dependencies be kept out of production images?

Extra reading


In the next chapter you bake these pinned dependencies into a Dockerfile, so the image ships with exactly the versions you locked here.