Week 5 - Containers & CI/CD

Intro: Containers and CI/CD

Dependency Management

Docker Fundamentals

Azure Container Registry

Python CI Pipeline

Practice

Assignment: Containerize and Ship

Gotchas & Pitfalls

Slides (PDF)

Career relevance: Week 5

Glossary: Week 5

Going Further

History of Containers and CI/CD

Docker Fundamentals

A container image is a reproducible package of your pipeline. Docker is the most common tool to build and run those images. If your image works locally, it should work in CI and in Azure.

By the end of this chapter, you should know how to write a basic Dockerfile, build an image, and run it with environment variables.

Concepts

Dockerfile basics

A Dockerfile is a recipe for building an image. The most important instructions are:

You will see full Dockerfile examples in the next section.

Installing dependencies in Docker: requirements.txt vs uv

You can build a Python image with either dependency workflow.

requirements.txt Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY src/ ./src/

CMD ["python", "-m", "src.pipeline"]

uv Dockerfile:

Docker lets you copy a single file from another image using COPY --from=<image> <src> <dest>. The line below uses this to grab the uv binary from the official uv image instead of installing it with pip.

FROM python:3.11-slim

WORKDIR /app

# Install uv from its official image (no pip needed)
COPY --from=ghcr.io/astral-sh/uv:0.6 /uv /usr/local/bin/uv

COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev

COPY src/ ./src/

CMD ["uv", "run", "python", "-m", "src.pipeline"]

Three things differ from the requirements.txt version:

  1. Installing uv itself. COPY --from= pulls the uv binary from a separate image: no pip needed.
  2. --frozen pins the locked versions exactly, so the image gets the same dependencies as your machine. See Dependency Management for why this matters.
  3. --no-dev skips development dependencies. Your production image does not need pytest or ruff.

Use requirements.txt in Docker when the project already uses it. Use uv sync --frozen in Docker when the project uses pyproject.toml and uv.lock. Either way, the goal is the same: your Docker image installs exactly the versions you tested locally.

<aside> 📘 Core program connection: In the Core program you built and ran containers in the Systems week. This is the same idea, now focused on Python pipelines. Review the Docker basics: Core Program - Docker

</aside>

Build and run

# Build the image
docker build -t weather-pipeline:1.0 .

# Run the container with an env var
docker run --rm -e API_KEY="redacted" weather-pipeline:1.0

If the container prints logs and exits, your image works.

--rm removes the container when it exits, so you do not accumulate stopped containers.

When you build an image, you give it a name and a tag (weather-pipeline:1.0). To push it to a registry, the full reference also includes the registry prefix (myregistry.azurecr.io/weather-pipeline:1.0). You will learn tagging strategies and registries in the next chapter.

<aside> ⌨️ Hands on: Write a function build_image_tag(registry, image, tag) that returns the full image name. If no tag is provided, default to "latest". If the registry is empty, omit the prefix.

</aside>

<aside> 🚀 Try it in the widget: build_image_tag exercise

</aside>

Commands like docker run --rm --env-file .env ... get long fast, especially once you add volume mounts and ports. For an optional way to alias them into make build / make run, see A Makefile to tame Docker commands on the Going Further page.

When a build fails, paste the error message and the Dockerfile into an LLM and ask for a fix.

<aside> 💡 Using AI to help: Paste the Docker build error and your Dockerfile into an LLM and ask for a diagnosis. (⚠️ Ensure no PII or sensitive company data is included!) Always verify the suggestion against the Docker docs before applying it.

</aside>

Passing configuration to containers

You have been using .env files and os.environ since Week 2. Inside a container, the same pattern applies: your Python code reads os.environ and the configuration is injected from outside.

<aside> 📘 Core program connection: Reading configuration from environment variables follows the same principle you used in Week 2's config and secrets chapter. The Python code inside the container still calls os.environ; Docker just controls which values get injected. Review: Week 2 - Configuration & Secrets

</aside>

Pass a single variable with -e, or load your whole .env file with --env-file:

# Pass a single variable
docker run --rm -e API_KEY="redacted" weather-pipeline:1.0

# Load many variables from a file
docker run --rm --env-file .env weather-pipeline:1.0

<aside> ⚠️ Never bake secrets into a Docker image with ENV API_KEY=... in the Dockerfile. That value becomes part of the image history and can be extracted. Pass secrets at runtime instead.

</aside>

Dockerfiles also have ARG for build-time values. Use ARG for metadata like a commit SHA; use -e or --env-file for runtime configuration like API keys and database URLs.

# Build-time only (not available when the container runs)
ARG BUILD_SHA

# Runtime (available inside the running container)
ENV APP_ENV=production

<aside> ⌨️ Hands on: Run your container with --env-file .env and print each value. Then remove one required variable from .env and confirm your code fails fast.

</aside>

For a refresher on which values belong in environment variables, see Configuration & Secrets (Week 2).

If you are unsure which values belong in env vars, use the decision list from Week 2 as your guide.

<aside> 💡 Using AI to help: Ask an LLM to categorize a list of config values into "secret" and "non-secret". (⚠️ Ensure no PII or sensitive company data is included!)

Always verify the result against your team's security policy.

</aside>

The practice of storing config in environment variables comes from a well-known set of guidelines.

<aside> 🤓 Curious Geek: The 12-factor rule

The 12-factor app guideline "Store config in the environment" is one reason env vars are the default in containers and CI.

</aside>

Inspect and debug a container

When a container misbehaves, you need to see what it is doing. Start with docker ps to find the container, then read its output by name or ID:

docker ps                     # List running containers (copy the NAME or ID)
docker logs <container-name>  # View a container's output (running or stopped)

If the container exits immediately, run it interactively with -it so you can poke around before it dies:

docker run -it --rm weather-pipeline:1.0 /bin/bash

Inside the container you can run python -m src.pipeline, inspect files, and test environment variables. Exit with exit or Ctrl+D. --rm still cleans up when you leave.

The debugging mindset here is the same one you used in the Core program: read the error, form a hypothesis, change one thing, and re-run.

<aside> 📘 Core program connection: The Core program covered error handling and debugging strategies in JavaScript. Inside a running container, the same approach applies: read the traceback, locate the failing call, and test your fix. Refresh: Core Program - Error handling

</aside>

Try it now to build confidence before you need it in CI.

<aside> ⌨️ Hands on: Build your image, then run it with -it and /bin/bash. Run pip list and python --version inside. What do you see?

</aside>

Run containers locally and get them working before you wire them into CI or Azure. The pipeline you build this week is a batch job that exits on its own; for detached mode (-d), opening a shell in an already-running container (docker exec -it), and managing long-running containers (docker stop, docker rm), see Container lifecycle: running detached on the Going Further page.

Layers and caching

Docker builds images in layers. If you change a file early in the Dockerfile, every layer after it rebuilds. Your code changes continuously during development, and you don't want to reinstall your dependencies every time you change your code.

<aside> ⚠️ If you copy the entire repo before installing dependencies, even a tiny code change will force a full reinstall in CI.

</aside>

Before (slow rebuilds):

COPY . .
# BAD: This will reinstall your dependencies every time you change your code
RUN pip install -r requirements.txt

After (fast rebuilds):

COPY requirements.txt .
RUN pip install -r requirements.txt

# GOOD: Now we only have to rebuild this part of image when we change our code
COPY . .

The same rule applies to uv: copy pyproject.toml and uv.lock first, run uv sync --frozen, then copy the rest of the source code.

<aside> 🖼️ Visual: Docker Layer Caching

</aside>

Production images live or die by this ordering.

<aside> 💡 In the wild: Apache Airflow's Dockerfile copies its dependency constraints before the application source, the same layer-caching order you just used, so a one-line code change does not reinstall hundreds of packages.

</aside>

Docker ignore

A .dockerignore file prevents large or sensitive files from entering the image. This keeps builds fast and avoids leaking secrets.

.venv/
__pycache__/
.env
.git/

<aside> ⚠️ You already added .env to .gitignore in Week 2 to keep secrets out of Git. .dockerignore is the same habit for images: keep the build context lean so secrets, the .git/ history, and other unwanted files never enter the image in the first place.

</aside>

The .gitignore pattern comes from Git itself, which you first used in the Core program.

<aside> 📘 Core program connection: .gitignore and .dockerignore follow the same idea: tell a tool which files to skip. You first created a .gitignore in the Core program's Git introduction. Review: Core Program - Intro to Git

</aside>

Every layer in an image is inspectable, which is a handy debugging tool in its own right.

<aside> 🤓 Curious Geek: You can inspect any image

</aside>

Exercises

  1. Explain why .dockerignore matters for security and speed.
  2. Rewrite a requirements.txt Dockerfile so dependency installs are cached.
  3. Write the equivalent uv Dockerfile using uv sync --frozen.
  4. Explain the difference between ARG and ENV in a Dockerfile.
  5. List three values that should never be hardcoded in a container image.

Knowledge Check

<aside> 🚀 Try it in the widget: Interactive Quiz: Docker Fundamentals

</aside>

https://lasse.be/simple-hyf-teach-widget/mcq.html?bank=week_5_ch3_docker_fundamentals&embed=1

If writing a Dockerfile for your Python script felt unclear, this video walks through containerizing a Python application from scratch.

<aside> 🎬 Struggling with this concept? Watch this beginner-friendly video:

</aside>

https://www.youtube.com/watch?v=0TFWtfFY87U

Extra reading