Docker Fundamentals

A container image is a reproducible package of your pipeline. Docker is the most common tool to build and run those images. If your image works locally, it should work in CI and in Azure.

By the end of this chapter, you should know how to write a basic Dockerfile, build an image, and run it with environment variables.

Concepts

Dockerfile basics

A Dockerfile is a recipe for building an image. The most important instructions are:

FROM: the base image
WORKDIR: the working directory inside the container
COPY: copy files into the image
RUN: run build steps like installing dependencies
CMD: default command when the container starts

You will see full Dockerfile examples in the next section.

Installing dependencies in Docker: requirements.txt vs uv

You can build a Python image with either dependency workflow.

requirements.txt Dockerfile:

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY src/ ./src/

CMD ["python", "-m", "src.pipeline"]

uv Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# Install uv from its official image (no pip needed)
COPY --from=ghcr.io/astral-sh/uv:0.6 /uv /usr/local/bin/uv

COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev

COPY src/ ./src/

CMD ["uv", "run", "python", "-m", "src.pipeline"]

Three things differ from the requirements.txt version:

Installing uv itself. COPY --from= pulls the uv binary from a separate image. This is a Docker multi-stage pattern: you do not need to install uv with pip.
--frozen is critical. It tells uv to use uv.lock exactly as written, without re-resolving. Without this flag, uv may pick different versions inside Docker than on your machine, defeating the purpose of the lock file.
--no-dev skips development dependencies. Your production image does not need pytest or ruff.

<aside> ⚠️ If you forget --frozen, uv will re-resolve dependencies during the build. Your image may end up with different versions than your lock file specifies. Always use --frozen in Dockerfiles.

</aside>

Use requirements.txt in Docker when the project already uses it. Use uv sync --frozen in Docker when the project uses pyproject.toml and uv.lock. Either way, the goal is the same: your Docker image installs exactly the versions you tested locally.

<aside> 📘 Core program connection: In the Core program you built and ran containers in the Systems week. This is the same idea, now focused on Python pipelines. Review the Docker basics: Core Program - Docker

</aside>

Build and run

# Build the image
docker build -t weather-pipeline:1.0 .

# Run the container with an env var
docker run --rm -e API_KEY="redacted" weather-pipeline:1.0

If the container prints logs and exits, your image works.

--rm removes the container when it exits, so you do not accumulate stopped containers.

Commands like docker run --rm --env-file .env ... get long fast, especially once you add volume mounts and ports. For an optional way to alias them into make build / make run, see A Makefile to tame Docker commands on the Going Further page.

Passing configuration to containers

You have been using .env files and os.environ since Week 2. Inside a container, the same pattern applies: your Python code reads os.environ and the configuration is injected from outside.

Pass a single variable with -e, or load your whole .env file with --env-file:

# Pass a single variable
docker run --rm -e API_KEY="redacted" weather-pipeline:1.0

# Load many variables from a file
docker run --rm --env-file .env weather-pipeline:1.0

<aside> ⚠️ Never bake secrets into a Docker image with ENV API_KEY=... in the Dockerfile. That value becomes part of the image history and can be extracted. Pass secrets at runtime instead.

</aside>

Dockerfiles also have ARG for build-time values. Use ARG for metadata like a commit SHA; use -e or --env-file for runtime configuration like API keys and database URLs.

# Build-time only (not available when the container runs)
ARG BUILD_SHA

# Runtime (available inside the running container)
ENV APP_ENV=production

<aside> ⌨️ Hands on: Run your container with --env-file .env and print each value. Then remove one required variable from .env and confirm your code fails fast.

</aside>

For a refresher on which values belong in environment variables, see Configuration & Secrets (Week 2).

If you are unsure which values belong in env vars, use the decision list from Week 2 as your guide.

<aside> 💡 Using AI to help: Ask an LLM to categorize a list of config values into "secret" and "non-secret". (⚠️ Ensure no PII or sensitive company data is included!)

Always verify the result against your team's security policy.

</aside>

The practice of storing config in environment variables comes from a well-known set of guidelines.

<aside> 🤓 Curious Geek: The 12-factor rule

The 12-factor app guideline "Store config in the environment" is one reason env vars are the default in containers and CI.

</aside>

Container lifecycle: foreground vs detached

By default, docker run runs in the foreground: the terminal stays attached to the container until it exits. For long-running services (e.g. a REST API), run in detached mode with -d so you get your terminal back:

# Foreground (default): blocks until the container exits
docker run --rm -e API_KEY="redacted" weather-pipeline:1.0

# Detached: runs in the background
docker run -d --name my-weather -e API_KEY="redacted" weather-pipeline:1.0

Useful lifecycle commands:

docker ps          # List running containers
docker ps -a       # List all containers (including stopped)
docker stop my-weather   # Gracefully stop a container
docker rm my-weather     # Remove a stopped container
docker logs my-weather   # View output from a running or stopped container
docker exec -it my-weather /bin/bash  # Get a shell inside a running container (for debugging)

<aside> 💡 Run containers locally first. Get them working before you wire them into CI or Azure.

</aside>

Interactive mode for debugging

When something goes wrong inside the container, run it interactively with -it to get a shell:

docker run -it --rm weather-pipeline:1.0 /bin/bash

Inside the container you can run python -m src.pipeline, inspect files, and test environment variables. Exit with exit or Ctrl+D. --rm still cleans up when you leave.

<aside> ⌨️ Hands on: Build your image, then run it with -it and /bin/bash. Run pip list and python --version inside. What do you see?

</aside>

Layers and caching

Docker builds images in layers. If you change a file early in the Dockerfile, every layer after it rebuilds. Your code changes continuously during development, and you don't want to reinstall your dependencies every time you change your code.

<aside> ⚠️ If you copy the entire repo before installing dependencies, even a tiny code change will force a full reinstall in CI.

</aside>

Before (slow rebuilds):

COPY . .
# BAD: This will reinstall your dependencies every time you change your code
RUN pip install -r requirements.txt

After (fast rebuilds):

COPY requirements.txt .
RUN pip install -r requirements.txt

# GOOD: Now we only have to rebuild this part of image when we change our code
COPY . .

The same rule applies to uv: copy pyproject.toml and uv.lock first, run uv sync --frozen, then copy the rest of the source code.

<aside> 🖼️ Visual: Docker Layer Caching

</aside>

Docker ignore

A .dockerignore file prevents large or sensitive files from entering the image. This keeps builds fast and avoids leaking secrets.

.venv/
__pycache__/
.env
.git/

<aside> ⚠️ You already added .env to .gitignore in Week 2 to keep secrets out of Git. .dockerignore is the same habit for images: a secret baked into a layer is extractable with docker history, even if it is not in your repo.

</aside>

Image tags and naming

When you build an image, you give it a name and a tag: weather-pipeline:1.0. When pushing to a registry, the full reference includes the registry prefix: myregistry.azurecr.io/weather-pipeline:1.0. You will learn more about tagging strategies and registries in the next chapter.

<aside> ⌨️ Hands on: Write a function build_image_tag(registry, image, tag) that returns the full image name. If no tag is provided, default to "latest". If the registry is empty, omit the prefix.

</aside>

<aside> 🚀 Try it in the widget: build_image_tag exercise

</aside>

Using AI to debug a Dockerfile

<aside> 💡 Using AI to help: If a Docker build fails, paste the error message and the Dockerfile into an LLM and ask for a fix. (⚠️ Ensure no PII or sensitive company data is included!)

</aside>

Double-check the suggestion against the Docker docs before you apply it.

<aside> 🤓 Curious Geek: You can inspect any image

Run docker history weather-pipeline:1.0 to see every layer in your image, the command that created it, and its size. This is how you debug large images and spot unnecessary files. It is also why secrets baked into a layer are never truly hidden: anyone with access to the image can read the history.

</aside>

Exercises

Explain why .dockerignore matters for security and speed.
Rewrite a requirements.txt Dockerfile so dependency installs are cached.
Write the equivalent uv Dockerfile using uv sync --frozen.
Explain the difference between ARG and ENV in a Dockerfile.
List three values that should never be hardcoded in a container image.

Knowledge Check

Why does Docker layer caching depend on the order of Dockerfile commands?
Why should uv use --frozen inside Docker and CI?
When should you add a .dockerignore, and what belongs inside it?
Why should secrets be passed at runtime instead of baked into images?
When should you use ARG instead of ENV?

Extra reading

In the next chapter you push this image to Azure Container Registry so it can be pulled by CI and by cloud services later in the track.