Assignment: Containerize and Ship
History of Containers and CI/CD
Your container should not be built from unverified code. A CI pipeline runs checks automatically so you catch errors before deployment.
By the end of this chapter, you should be able to read and write a basic GitHub Actions workflow for linting, testing, building a container image, and pushing it to Azure Container Registry.
For data projects, a good CI pipeline includes:
<aside> 💡 Start with tests and formatting. Add more checks as your pipeline grows.
</aside>
The sequence of these checks matters: fast ones first, slow ones last, so you fail quickly on the cheapest errors.
<aside> 🖼️ Visual: CI Pipeline Flow
</aside>
You already use ruff locally to lint and format your code (see Linting and Formatting, Week 2). In CI, you run the exact same commands as workflow steps. The difference: CI runs them automatically on every push, so unformatted or broken code never reaches main.
The workflow example below includes ruff check src as a lint step. You can also add ruff format --check src to verify formatting without modifying files (CI should check, not fix).
<aside>
📘 Core program connection: Code style enforcement is not new. In the Core program you used autoformatting tools with JavaScript to keep code consistent. ruff is the Python equivalent: one tool that lints and formats. Review: Core Program - Style: Autoformatting
</aside>
A workflow file lives in .github/workflows/ci.yml. Every workflow has three structural layers:
on: defines what triggers the workflow (a push to main, a pull request, or a scheduled time)jobs: groups the work into named units that each run on a separate machinesteps: lists the individual commands inside a job: uses: references a reusable action from the GitHub marketplace; run: executes a shell commandThe on: pull_request trigger means CI runs on every branch you push and open a PR from, not just main. This is the same branch-based workflow you practiced in the Core program.
<aside> 📘 Core program connection: GitHub Actions workflows trigger on the branches and pull requests you already use for code review. If the branch-and-PR model feels rusty, review it here: Core Program - Git branches
</aside>
name: CI
on:
push:
branches: ["main"]
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v5 # installs uv on the runner
- name: Install dependencies
run: uv sync --frozen
- name: Lint
run: uv run ruff check src
- name: Format check
run: uv run ruff format --check src
- name: Test
run: uv run pytest -q
<aside> 📘 Core program connection: You already learned CI ideas in the Systems week. The same workflow model applies, but now you run Python tooling. Refresh here: Core Program - Continuous integration
</aside>
This lint-then-test shape is exactly what mature Python projects run.
<aside> 💡 In the wild: pydantic's CI workflow runs the same sequence on every pull request: ruff to lint, then the test suite, so unstyled or broken code never reaches the main branch.
</aside>
uv sync --frozen or pin in requirements.txt.The test step in CI runs the same pytest suite you wrote in Week 2. CI does not introduce new testing concepts: it just runs your existing tests in a clean environment on every push.
<aside>
📘 Core program connection: pytest was introduced in Week 2 - Testing with pytest. The Core program also covered unit testing in JavaScript (Core Program - Unit testing). The same "write a function, write a test, run them automatically" loop now runs inside a GitHub Actions job.
</aside>
Your workflow needs API keys and Azure credentials, but these must never appear in code. GitHub Secrets stores them encrypted and makes them available to workflows as ${{ secrets.SECRET_NAME }}.
GitHub Secrets is the CI equivalent of the .env file you use locally. The same rule applies in both places: never commit secret values to the repository.
<aside> 📘 Core program connection: You learned to keep secrets out of source code in Week 2 - Configuration & Secrets. GitHub Secrets is how that same rule is enforced in CI: environment variables are injected at run time, never stored in the workflow file itself.
</aside>
Adding a secret to your repository:
API_KEY) and the value. Click Add secret.The secret is now available in any workflow in that repository.
<aside>
⌨️ Hands on: Add a secret called API_KEY to your repository. Use a dummy value like test-key-123 for now.
</aside>
In your workflow, reference a secret with ${{ secrets.SECRET_NAME }} and pass it as an environment variable:
- name: Run pipeline smoke test
env:
API_KEY: ${{ secrets.API_KEY }}
run: python -m src.pipeline --limit 10
<aside>
⚠️ GitHub masks secrets in logs, but avoid printing them with echo. If a secret is accidentally exposed, rotate it immediately.
</aside>
CI should build the Docker image on every push to confirm the Dockerfile works with the current code. Docker is pre-installed on ubuntu-latest runners, so no extra setup is needed.
- name: Build image
run: docker build --platform linux/amd64 -t weather-pipeline:${{ github.sha }} .
Tagging with the commit SHA (${{ github.sha }}) gives every build a unique, traceable tag.
<aside>
⌨️ Hands on: Add a pytest -q step to your workflow and confirm the job fails when you break a test.
</aside>
In the previous chapter you pushed to ACR using az acr login, which uses your personal Azure session. CI runners do not have your browser session, so they need a service principal, an identity created specifically for automation.
A service principal is like a username and password for a machine. It has a JSON block that looks like this:
{
"clientId": "...",
"clientSecret": "...",
"subscriptionId": "...",
"tenantId": "..."
}
Your teacher creates one service principal for the whole cohort with the AcrPush role (permission to push images to the registry) and shares the same JSON with everyone in the class. You cannot use your own az login here: the CI runner is a headless machine with no browser to sign in through, so it needs this machine identity instead.
Store the credentials as a GitHub Secret:
Go to your repository Settings → Secrets and variables → Actions → New repository secret. Name it AZURE_CREDENTIALS and paste the full JSON block as the value.
<aside>
⌨️ Hands on: Ask your teacher for the cohort's Azure credentials. Store them as AZURE_CREDENTIALS in your GitHub repository secrets.
</aside>
Now you can automate the push. The azure/login action reads your AZURE_CREDENTIALS secret to authenticate, and then Docker commands push to your registry.
- name: Azure login
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: ACR login
run: az acr login --name hyfregistry
- name: Build and push
run: |
docker build --platform linux/amd64 -t hyfregistry.azurecr.io/<your-handle>-weather-pipeline:${{ github.sha }} .
docker push hyfregistry.azurecr.io/<your-handle>-weather-pipeline:${{ github.sha }}
<aside>
⚠️ Replace <your-handle> with the lowercase form of your GitHub username (Docker image names must be lowercase). The registry is shared with the whole cohort, so a unique image-name prefix keeps your image separate from your classmates' and makes it easy to find in the portal. See Azure Container Registry: The registry is shared for the full rationale.
</aside>
Once the push step succeeds, the same image you built on CI is sitting in ACR, ready for any Azure service that has pull access to reuse it.
<aside> 📘 Core program connection: In the Core program you learned about continuous delivery and deployment. Here the "delivery" step is pushing an image to ACR. Review the concept: Core Program - Continuous delivery
</aside>
The snippets above work best as two separate jobs in a single workflow file. Split test from push-to-acr so you get fast feedback on code quality before spending time building and pushing an image.
name: CI
on:
push:
branches: ["main"]
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v5
- name: Install dependencies
run: uv sync --frozen
- name: Lint
run: uv run ruff check src
- name: Format check
run: uv run ruff format --check src
- name: Test
run: uv run pytest -q
push-to-acr:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Azure login
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: ACR login
run: az acr login --name hyfregistry
- name: Build and push
run: |
docker build --platform linux/amd64 -t hyfregistry.azurecr.io/<your-handle>-pipeline:${{ github.sha }} .
docker push hyfregistry.azurecr.io/<your-handle>-pipeline:${{ github.sha }}
Two things to notice:
needs: test: push-to-acr only starts if test succeeds. If lint or a test fails, CI stops before touching ACR. You push only verified code.if: github.ref == 'refs/heads/main': the push job runs only when the commit lands on main, not on every pull request branch. PRs still get the full test job for fast feedback, but do not produce a new image until the code is reviewed and merged.In Week 6 you will deploy that image to Azure Container Apps so it runs on a schedule or on demand: you carry the exact same image forward, so Week 6 adds the deployment step on top of work you have already finished, not a separate set of cloud skills to learn from scratch.
When a CI job fails and the log is hard to read, paste it into an LLM with the relevant test file and ask for the most likely root cause.
<aside> 💡 Using AI to help: Paste a failing CI log and the relevant test file into an LLM and ask for the most likely root cause. (⚠️ Ensure no PII or sensitive company data is included!) Always reproduce the issue locally before committing a fix.
</aside>
CI tooling has changed dramatically since the early 2000s, but the core idea has stayed the same.
<aside> 🤓 Curious Geek: Why CI was invented
</aside>
main.main?<aside> 🚀 Try it in the widget: Interactive Quiz: Python CI Pipeline
</aside>
https://lasse.be/simple-hyf-teach-widget/mcq.html?bank=week_5_ch5_python_ci_pipeline&embed=1
If the GitHub Actions workflow syntax or how triggers and steps connect felt unclear, this video explains the mechanics from scratch.
<aside> 🎬 Struggling with this concept? Watch this beginner-friendly video:
</aside>
https://www.youtube.com/watch?v=9flcoQ1R0Y4
<aside> 📚 For deeper topics (reusable workflows, matrix builds, self-hosted runners, caching strategies), see the optional Going Further page.
</aside>
Ready to wire this up end-to-end? Exercise 5 walks through the full CI Smoke Test against your own repo.