Introduction to Containers and CI/CD
Week 5 Assignment: Containerize and Ship a Data Pipeline
Your container should not be built from unverified code. A CI pipeline runs checks automatically so you catch errors before deployment.
By the end of this chapter, you should be able to read and write a basic GitHub Actions workflow for linting, testing, building a container image, and pushing it to Azure Container Registry.
For data projects, a good CI pipeline includes:
<aside> ๐ก Start with tests and formatting. Add more checks as your pipeline grows.
</aside>
The sequence of these checks matters: fast ones first, slow ones last, so you fail quickly on the cheapest errors.
<aside> ๐ผ๏ธ Visual: CI Pipeline Flow
</aside>
You already use ruff locally to lint and format your code (see Linting and Formatting, Week 2). In CI, you run the exact same commands as workflow steps. The difference: CI runs them automatically on every push, so unformatted or broken code never reaches main.
The workflow example below includes ruff check src as a lint step. You can also add ruff format --check src to verify formatting without modifying files (CI should check, not fix).
A workflow file lives in .github/workflows/ci.yml. It has triggers, jobs, and steps.
name: CI
on:
push:
branches: ["main"]
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: pip install -r requirements.txt
- name: Lint
run: ruff check src
- name: Format check
run: ruff format --check src
- name: Test
run: pytest -q
<aside> ๐ Core program connection: You already learned CI ideas in the Systems week. The same workflow model applies, but now you run Python tooling. Refresh here: Core Program - Continuous integration
</aside>
Your workflow needs API keys and Azure credentials, but these must never appear in code. GitHub Secrets stores them encrypted and makes them available to workflows as ${{ secrets.SECRET_NAME }}.
Adding a secret to your repository:
API_KEY) and the value. Click Add secret.The secret is now available in any workflow in that repository.
<aside>
โจ๏ธ Hands on: Add a secret called API_KEY to your repository. Use a dummy value like test-key-123 for now.
</aside>
In your workflow, reference a secret with ${{ secrets.SECRET_NAME }} and pass it as an environment variable:
- name: Run pipeline smoke test
env:
API_KEY: ${{ secrets.API_KEY }}
run: python -m src.pipeline --limit 10
<aside>
โ ๏ธ GitHub masks secrets in logs, but avoid printing them with echo. If a secret is accidentally exposed, rotate it immediately.
</aside>
CI should build the Docker image on every push to confirm the Dockerfile works with the current code. Docker is pre-installed on ubuntu-latest runners, so no extra setup is needed.
- name: Build image
run: docker build -t weather-pipeline:${{ github.sha }} .
Tagging with the commit SHA (${{ github.sha }}) gives every build a unique, traceable tag.
<aside>
โจ๏ธ Hands on: Add a pytest -q step to your workflow and confirm the job fails when you break a test.
</aside>
In the previous chapter you pushed to ACR using az acr login, which uses your personal Azure session. CI runners do not have your browser session, so they need a service principal, an identity created specifically for automation.
A service principal is like a username and password for a machine. It has a JSON block that looks like this:
{
"clientId": "...",
"clientSecret": "...",
"subscriptionId": "...",
"tenantId": "..."
}
Your teacher will create a service principal for your team with the AcrPush role (permission to push images to the registry) and share the JSON with you.
Store the credentials as a GitHub Secret:
Go to your repository Settings โ Secrets and variables โ Actions โ New repository secret. Name it AZURE_CREDENTIALS and paste the full JSON block as the value.
<aside>
โจ๏ธ Hands on: Ask your teacher for your team's Azure credentials. Store them as AZURE_CREDENTIALS in your GitHub repository secrets.
</aside>
Now you can automate the push. The azure/login action reads your AZURE_CREDENTIALS secret to authenticate, and then Docker commands push to your registry.
- name: Azure login
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: ACR login
run: az acr login --name hyfregistry
- name: Build and push
run: |
docker build -t hyfregistry.azurecr.io/weather-pipeline:${{ github.sha }} .
docker push hyfregistry.azurecr.io/weather-pipeline:${{ github.sha }}
<aside> ๐ Core program connection: In the Core program you learned about continuous delivery and deployment. Here the "delivery" step is pushing an image to ACR. Review the concept: Core Program - Continuous delivery
</aside>
In Week 6 you will deploy that image to Azure Container Apps so it runs on a schedule or on demand.
<aside> ๐ก Using AI to help: Paste a failing CI log and the relevant test file into an LLM and ask for the most likely root cause. (โ ๏ธ Ensure no PII or sensitive company data is included!)
</aside>
Always reproduce the issue locally before committing a fix.
<aside> ๐ค Curious Geek: Why CI was invented
Early CI tools like CruiseControl appeared in the early 2000s to reduce integration pain in large codebases. The idea was to merge often and test automatically to avoid "merge hell".
For the full CI lineage from CruiseControl to GitHub Actions, see the optional History of Containers and CI/CD page.
</aside>
main.main?With the concepts in place, the optional Practice exercises let you rehearse the Dockerfile and CI patterns before tackling the Assignment.
The HackYourFuture curriculum is licensed underย CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with โค๏ธ by the HackYourFuture community ยท Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.