Week 5 - Containers & CI/CD

Intro: Containers and CI/CD

Dependency Management

Docker Fundamentals

Azure Container Registry

Python CI Pipeline

Practice

Assignment: Containerize and Ship

Gotchas & Pitfalls

Slides (PDF)

Career relevance: Week 5

Glossary: Week 5

Going Further

History of Containers and CI/CD

Assignment: Containerize and Ship

The Scenario

It is sprint review day at StreamFlow Analytics. Your pipeline runs fine locally, but the previous engineer left a note: "Works on my machine. No idea how to deploy it." Your lead wants the pipeline containerized and shipping to Azure Container Registry automatically on every merge to main, before next sprint.

Your goal is to make your Week 3 or Week 4 pipeline reproducible, containerized, and delivered through a CI workflow.

How to Start

The Week 5 assignment repo lives in the HYF organization. Your cohort uses a fork in the HackYourAssignment organization: your teacher will share the link at the start of the week.

  1. Fork your cohort's repo from HackYourAssignment into your own GitHub account.
  2. Clone your fork locally.
  3. Create a feature branch: git switch -c week5-attempt.
  4. Work through the tasks below.
  5. Commit, push, and open a Pull Request. Share the PR URL with your teacher once you are ready for review.

<aside> 💻 Open in GitHub Codespaces

</aside>

If you open in Codespaces, Docker and the Azure CLI are pre-installed. Run az login --use-device-code before Task 7 and sign in with the HackYourFuture credentials your teacher provided.

Technical Requirements

Use this project structure:

week5-container-assignment/
├── .github/
│   └── workflows/
│       └── ci.yml
├── assets/
│   └── acr_push_week5.png    (screenshot deliverable — Task 7)
├── src/
│   └── pipeline.py
├── tests/
│   └── test_pipeline.py
├── Dockerfile
├── requirements.txt          (or pyproject.toml + uv.lock)
└── AI_ASSIST.md

Task 1: Choose a Pipeline

Pick one pipeline you already built:

Copy the code into your assignment folder and make sure it runs locally before you write a single line of Dockerfile.

<aside> 💡 Choose the pipeline you understand best. You will debug it inside a container.

</aside>

Connect to Azure Blob Storage

Your pipeline needs to download the input files from Azure Blob Storage. Use a connection string for authentication: it works the same way inside Docker as it does locally, with no extra configuration needed.

Add AZURE_STORAGE_CONNECTION_STRING to your .env file. Fetch it from the class Key Vault:

az login --use-device-code
az keyvault secret show \
  --vault-name kv-hyf-data \
  --name azure-storage-connection-string-demo \
  --query value -o tsv

Copy the output and add it to your .env:

AZURE_STORAGE_CONNECTION_STRING=DefaultEndpointsProtocol=https;AccountName=...

Then connect using the connection string instead of DefaultAzureCredential:

import os
from azure.storage.blob import BlobServiceClient

conn = os.environ["AZURE_STORAGE_CONNECTION_STRING"]
service = BlobServiceClient.from_connection_string(conn)

<aside> ⚠️ DefaultAzureCredential relies on az login from your host machine. Inside a Docker container that credential chain does not work without extra setup. The connection string approach is simpler and works everywhere: locally, in Docker, and on Azure.

</aside>

Task 2: Define Dependencies

Create a requirements.txt or a pyproject.toml with pinned versions. Your pipeline must run in a clean virtual environment.

<aside> 💡 Not sure which to use? Use whatever your Week 3 or Week 4 pipeline already has (usually requirements.txt). Switch to uv only if you are starting fresh. Both are accepted: see Dependency Management.

</aside>

Example requirements.txt:

requests==2.31.0
pydantic==2.6.1

Verify it works:

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python -m src.pipeline

<aside> ⚠️ If you rely on system-wide packages, your container will fail in CI.

</aside>

Task 3: Write Tests

The CI workflow runs pytest tests/ on every push. Write at least two unit tests for your pipeline functions so the step passes.

You do not need to test the full pipeline end-to-end: test the individual helper functions (cleaning, transforming, validating). Each test should cover one function with one clear assertion.

Example for a clean_name function:

from src.pipeline import clean_name

def test_clean_name_strips_whitespace():
    assert clean_name("  Alice  ") == "Alice"

def test_clean_name_handles_empty():
    assert clean_name("") == ""

Verify locally:

pytest tests/ -v

<aside> 💡 If your pipeline is one large script with no helper functions, extract one or two small functions first: this makes both testing and Docker debugging easier.

</aside>

Task 4: Write a Dockerfile

Create a Dockerfile that:

  1. Uses python:3.11-slim as the base image
  2. Copies requirements.txt (or pyproject.toml + uv.lock) before copying source code, so dependency installs are cached (see Docker Fundamentals)
  3. Installs dependencies
  4. Copies your source code
  5. Runs the pipeline with CMD

Test it locally before moving to Task 6:

<aside> ⚠️ Replace <your-handle> with the lowercase form of your GitHub username throughout this assignment (e.g. alice-pipeline, not Alice-pipeline). Docker image names must be lowercase: docker build rejects uppercase characters with an "invalid reference format" error. The cohort also shares one ACR instance in Task 7, and a unique image-name prefix prevents your push from overwriting a classmate's tag. Pick your handle once now and use the same name in every Docker and CI command.

</aside>

docker build -t <your-handle>-pipeline:1.0 .
docker run --rm --env-file .env <your-handle>-pipeline:1.0

<aside> ⚠️ If docker run exits immediately with no output, run without -d first so you can see the logs.

</aside>

Task 5: Add Configuration

Move secrets and runtime values out of your code into environment variables. Name the key variable API_KEY (or whatever your pipeline uses). Your container must read it from the environment, not from a hardcoded value or a committed .env file.

<aside> ⌨️ Hands on: Run your container with API_KEY unset and confirm it exits with a clear error message rather than silently producing wrong output.

</aside>

Task 6: Build a CI Workflow

Create .github/workflows/ci.yml that runs on pull requests and on pushes to main:

name: CI

on:
  push:
    branches: ["main"]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Lint
        run: ruff check src
      - name: Format
        run: ruff format --check src
      - name: Test
        run: pytest -q
      - name: Build image
        run: docker build -t <your-handle>-pipeline:${{ github.sha }} .

<aside> 💡 Your requirements.txt (or pyproject.toml) must include ruff and pytest so the Lint and Test steps find them. If you use uv, replace the Install step with uv sync --frozen.

</aside>

See Python CI Pipeline for a full explanation of each step. Push this workflow first without the ACR push step (Task 7) and confirm lint and tests go green before adding the registry push.

Task 7: Push to Azure Container Registry

<aside> ⚠️ Before you start: you need an AZURE_CREDENTIALS secret in your repository. Your teacher sends the JSON over Slack DM to every student in the cohort: it is the same JSON for everyone. Ping your teacher in Slack if you have not received it by the time you reach this task, then follow the steps in Python CI Pipeline: Setting up Azure credentials for CI.

Treat this JSON as a secret. It is a service-principal client secret that grants push access to the cohort registry. Never commit it to git, never paste it into a help channel or screenshot, and never share it outside the cohort. Store it only as the GitHub Secret in step 1 below.

</aside>

If you do not yet have the Slack JSON, or the CI push fails and you are running short of time, jump to Fallback: push from your laptop at the end of this task. Your own Azure login already has AcrPush on hyfregistry, so the laptop path needs no extra credentials. The deliverable does not change: the image still lands in ACR, and the Portal screenshot is still what you submit.

Once you have the credentials:

  1. Go to your repository on GitHub: SettingsSecrets and variablesActionsNew repository secret. Name it AZURE_CREDENTIALS and paste the JSON your teacher gave you.
  2. Add the following steps to your ci.yml job, after the Build image step:
         - name: Azure login
           uses: azure/login@v2
           with:
             creds: ${{ secrets.AZURE_CREDENTIALS }}
         - name: ACR login
           run: az acr login --name hyfregistry
         - name: Push image
           run: |
             docker tag <your-handle>-pipeline:${{ github.sha }} hyfregistry.azurecr.io/<your-handle>-pipeline:${{ github.sha }}
             docker push hyfregistry.azurecr.io/<your-handle>-pipeline:${{ github.sha }}
  1. Push your branch and watch the Actions tab. A green run means your image is in ACR.
  2. Verify in the Azure Portal:

<aside> 💡 If the push step fails with "unauthorized", check that your AZURE_CREDENTIALS secret is set correctly and that your teacher has granted the service principal the AcrPush role on hyfregistry.

</aside>

Fallback: push from your laptop

If you do not yet have the Slack JSON, or your CI push fails repeatedly and you are running short of time, push the image to ACR by hand from your own machine. Your class Azure account already has AcrPush on hyfregistry (granted to the HYF-Students group), so you do not need any extra credentials beyond signing in. The deliverable does not change: the image lands in ACR with the right tag, and the Azure Portal screenshot in step 4 above is still what you submit.

# 1. Sign in with your HYF Azure account (--tenant pins you to the HYF directory
#    so the registry permissions apply even if your `az` already has another account).
az login --use-device-code --tenant 07a14c4e-d88c-42f7-83b3-13af7e57ff3d

# 2. Authenticate Docker to the registry
az acr login --name hyfregistry

# 3. Build, tag, and push with a SHA-style tag (matches what CI would have produced)
SHA=$(git rev-parse HEAD)
docker build -t hyfregistry.azurecr.io/<your-handle>-pipeline:$SHA .
docker push hyfregistry.azurecr.io/<your-handle>-pipeline:$SHA

Then go back to step 4 above and take the same Azure Portal screenshot.

<aside> 💭 The CI push is still the goal: it is what proves the automated, build-verified delivery loop that real data teams rely on. Use this fallback only when the CI path is genuinely blocking you, and flag the CI failure to your teacher so the two of you can debug it together afterwards.

</aside>

Task 8: AI Assist Report

Create AI_ASSIST.md and describe:

Deliverables

Submit all of the following: