Assignment: Containerize and Ship
History of Containers and CI/CD
It is sprint review day at StreamFlow Analytics. Your pipeline runs fine locally, but the previous engineer left a note: "Works on my machine. No idea how to deploy it." Your lead wants the pipeline containerized and shipping to Azure Container Registry automatically on every merge to main, before next sprint.
Your goal is to make your Week 3 or Week 4 pipeline reproducible, containerized, and delivered through a CI workflow.
The Week 5 assignment repo lives in the HYF organization. Your cohort uses a fork in the HackYourAssignment organization: your teacher will share the link at the start of the week.
git switch -c week5-attempt.<aside> 💻 Open in GitHub Codespaces
</aside>
If you open in Codespaces, Docker and the Azure CLI are pre-installed. Run az login --use-device-code before Task 7 and sign in with the HackYourFuture credentials your teacher provided.
Use this project structure:
week5-container-assignment/
├── .github/
│ └── workflows/
│ └── ci.yml
├── assets/
│ └── acr_push_week5.png (screenshot deliverable — Task 7)
├── src/
│ └── pipeline.py
├── tests/
│ └── test_pipeline.py
├── Dockerfile
├── requirements.txt (or pyproject.toml + uv.lock)
└── AI_ASSIST.md
logging instead of print for pipeline status (see Python Setup, Week 1)..env files.pathlib.Path for file paths.Pick one pipeline you already built:
Copy the code into your assignment folder and make sure it runs locally before you write a single line of Dockerfile.
<aside> 💡 Choose the pipeline you understand best. You will debug it inside a container.
</aside>
Your pipeline needs to download the input files from Azure Blob Storage. Use a connection string for authentication: it works the same way inside Docker as it does locally, with no extra configuration needed.
Add AZURE_STORAGE_CONNECTION_STRING to your .env file. Fetch it from the class Key Vault:
az login --use-device-code
az keyvault secret show \
--vault-name kv-hyf-data \
--name azure-storage-connection-string-demo \
--query value -o tsv
Copy the output and add it to your .env:
AZURE_STORAGE_CONNECTION_STRING=DefaultEndpointsProtocol=https;AccountName=...
Then connect using the connection string instead of DefaultAzureCredential:
import os
from azure.storage.blob import BlobServiceClient
conn = os.environ["AZURE_STORAGE_CONNECTION_STRING"]
service = BlobServiceClient.from_connection_string(conn)
<aside>
⚠️ DefaultAzureCredential relies on az login from your host machine. Inside a Docker container that credential chain does not work without extra setup. The connection string approach is simpler and works everywhere: locally, in Docker, and on Azure.
</aside>
Create a requirements.txt or a pyproject.toml with pinned versions. Your pipeline must run in a clean virtual environment.
<aside>
💡 Not sure which to use? Use whatever your Week 3 or Week 4 pipeline already has (usually requirements.txt). Switch to uv only if you are starting fresh. Both are accepted: see Dependency Management.
</aside>
Example requirements.txt:
requests==2.31.0
pydantic==2.6.1
Verify it works:
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python -m src.pipeline
<aside> ⚠️ If you rely on system-wide packages, your container will fail in CI.
</aside>
The CI workflow runs pytest tests/ on every push. Write at least two unit tests for your pipeline functions so the step passes.
You do not need to test the full pipeline end-to-end: test the individual helper functions (cleaning, transforming, validating). Each test should cover one function with one clear assertion.
Example for a clean_name function:
from src.pipeline import clean_name
def test_clean_name_strips_whitespace():
assert clean_name(" Alice ") == "Alice"
def test_clean_name_handles_empty():
assert clean_name("") == ""
Verify locally:
pytest tests/ -v
<aside> 💡 If your pipeline is one large script with no helper functions, extract one or two small functions first: this makes both testing and Docker debugging easier.
</aside>
Create a Dockerfile that:
python:3.11-slim as the base imagerequirements.txt (or pyproject.toml + uv.lock) before copying source code, so dependency installs are cached (see Docker Fundamentals)CMDTest it locally before moving to Task 6:
<aside>
⚠️ Replace <your-handle> with the lowercase form of your GitHub username throughout this assignment (e.g. alice-pipeline, not Alice-pipeline). Docker image names must be lowercase: docker build rejects uppercase characters with an "invalid reference format" error. The cohort also shares one ACR instance in Task 7, and a unique image-name prefix prevents your push from overwriting a classmate's tag. Pick your handle once now and use the same name in every Docker and CI command.
</aside>
docker build -t <your-handle>-pipeline:1.0 .
docker run --rm --env-file .env <your-handle>-pipeline:1.0
<aside>
⚠️ If docker run exits immediately with no output, run without -d first so you can see the logs.
</aside>
Move secrets and runtime values out of your code into environment variables. Name the key variable API_KEY (or whatever your pipeline uses). Your container must read it from the environment, not from a hardcoded value or a committed .env file.
<aside>
⌨️ Hands on: Run your container with API_KEY unset and confirm it exits with a clear error message rather than silently producing wrong output.
</aside>
Create .github/workflows/ci.yml that runs on pull requests and on pushes to main:
name: CI
on:
push:
branches: ["main"]
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: pip install -r requirements.txt
- name: Lint
run: ruff check src
- name: Format
run: ruff format --check src
- name: Test
run: pytest -q
- name: Build image
run: docker build -t <your-handle>-pipeline:${{ github.sha }} .
<aside>
💡 Your requirements.txt (or pyproject.toml) must include ruff and pytest so the Lint and Test steps find them. If you use uv, replace the Install step with uv sync --frozen.
</aside>
See Python CI Pipeline for a full explanation of each step. Push this workflow first without the ACR push step (Task 7) and confirm lint and tests go green before adding the registry push.
<aside>
⚠️ Before you start: you need an AZURE_CREDENTIALS secret in your repository. Your teacher sends the JSON over Slack DM to every student in the cohort: it is the same JSON for everyone. Ping your teacher in Slack if you have not received it by the time you reach this task, then follow the steps in Python CI Pipeline: Setting up Azure credentials for CI.
Treat this JSON as a secret. It is a service-principal client secret that grants push access to the cohort registry. Never commit it to git, never paste it into a help channel or screenshot, and never share it outside the cohort. Store it only as the GitHub Secret in step 1 below.
</aside>
If you do not yet have the Slack JSON, or the CI push fails and you are running short of time, jump to Fallback: push from your laptop at the end of this task. Your own Azure login already has AcrPush on hyfregistry, so the laptop path needs no extra credentials. The deliverable does not change: the image still lands in ACR, and the Portal screenshot is still what you submit.
Once you have the credentials:
AZURE_CREDENTIALS and paste the JSON your teacher gave you.ci.yml job, after the Build image step: - name: Azure login
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: ACR login
run: az acr login --name hyfregistry
- name: Push image
run: |
docker tag <your-handle>-pipeline:${{ github.sha }} hyfregistry.azurecr.io/<your-handle>-pipeline:${{ github.sha }}
docker push hyfregistry.azurecr.io/<your-handle>-pipeline:${{ github.sha }}
hyfregistry in the top search bar and open the Container Registry.<your-handle>-pipeline and confirm your commit SHA tag is listed.assets/acr_push_week5.png.<aside>
💡 If the push step fails with "unauthorized", check that your AZURE_CREDENTIALS secret is set correctly and that your teacher has granted the service principal the AcrPush role on hyfregistry.
</aside>
If you do not yet have the Slack JSON, or your CI push fails repeatedly and you are running short of time, push the image to ACR by hand from your own machine. Your class Azure account already has AcrPush on hyfregistry (granted to the HYF-Students group), so you do not need any extra credentials beyond signing in. The deliverable does not change: the image lands in ACR with the right tag, and the Azure Portal screenshot in step 4 above is still what you submit.
# 1. Sign in with your HYF Azure account (--tenant pins you to the HYF directory
# so the registry permissions apply even if your `az` already has another account).
az login --use-device-code --tenant 07a14c4e-d88c-42f7-83b3-13af7e57ff3d
# 2. Authenticate Docker to the registry
az acr login --name hyfregistry
# 3. Build, tag, and push with a SHA-style tag (matches what CI would have produced)
SHA=$(git rev-parse HEAD)
docker build -t hyfregistry.azurecr.io/<your-handle>-pipeline:$SHA .
docker push hyfregistry.azurecr.io/<your-handle>-pipeline:$SHA
Then go back to step 4 above and take the same Azure Portal screenshot.
<aside> 💭 The CI push is still the goal: it is what proves the automated, build-verified delivery loop that real data teams rely on. Use this fallback only when the CI path is genuinely blocking you, and flag the CI failure to your teacher so the two of you can debug it together afterwards.
</aside>
Create AI_ASSIST.md and describe:
Submit all of the following:
tests/test_pipeline.py with at least 2 passing testsDockerfilerequirements.txt or pyproject.toml with pinned versions.github/workflows/ci.yml that passes on GitHub Actions