Week 5 - Containers & CI/CD

Introduction to Containers and CI/CD

Dependency Management

Docker Fundamentals

Azure Container Registry

Python CI Pipeline

Week 5 Gotchas & Pitfalls

Practice

Week 5 Assignment: Containerize and Ship a Data Pipeline

Week 5 Lesson Plan

Python CI Pipeline

Your container should not be built from unverified code. A CI pipeline runs checks automatically so you catch errors before deployment.

By the end of this chapter, you should be able to read and write a basic GitHub Actions workflow for linting, testing, building a container image, and pushing it to Azure Container Registry.

Concepts

What CI should check

For data projects, a good CI pipeline includes:

<aside> ๐Ÿ’ก Start with tests and formatting. Add more checks as your pipeline grows.

</aside>

The sequence of these checks matters: fast ones first, slow ones last, so you fail quickly on the cheapest errors.

<aside> ๐Ÿ–ผ๏ธ Visual: CI Pipeline Flow

</aside>

Linting in CI

You already use ruff locally to lint and format your code (see Linting and Formatting, Week 2). In CI, you run the exact same commands as workflow steps. The difference: CI runs them automatically on every push, so unformatted or broken code never reaches main.

The workflow example below includes ruff check src as a lint step. You can also add ruff format --check src to verify formatting without modifying files (CI should check, not fix).

GitHub Actions basics

A workflow file lives in .github/workflows/ci.yml. It has triggers, jobs, and steps.

name: CI

on:
  push:
    branches: ["main"]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Lint
        run: ruff check src
      - name: Format check
        run: ruff format --check src
      - name: Test
        run: pytest -q

<aside> ๐Ÿ“˜ Core program connection: You already learned CI ideas in the Systems week. The same workflow model applies, but now you run Python tooling. Refresh here: Core Program - Continuous integration

</aside>

Keep CI fast and deterministic

Secrets in CI

Your workflow needs API keys and Azure credentials, but these must never appear in code. GitHub Secrets stores them encrypted and makes them available to workflows as ${{ secrets.SECRET_NAME }}.

Adding a secret to your repository:

  1. Go to your repository on GitHub.
  2. Click Settings โ†’ Secrets and variables โ†’ Actions.
  3. Click New repository secret.
  4. Enter a name (e.g. API_KEY) and the value. Click Add secret.

The secret is now available in any workflow in that repository.

<aside> โŒจ๏ธ Hands on: Add a secret called API_KEY to your repository. Use a dummy value like test-key-123 for now.

</aside>

In your workflow, reference a secret with ${{ secrets.SECRET_NAME }} and pass it as an environment variable:

      - name: Run pipeline smoke test
        env:
          API_KEY: ${{ secrets.API_KEY }}
        run: python -m src.pipeline --limit 10

<aside> โš ๏ธ GitHub masks secrets in logs, but avoid printing them with echo. If a secret is accidentally exposed, rotate it immediately.

</aside>

Building container images in CI

CI should build the Docker image on every push to confirm the Dockerfile works with the current code. Docker is pre-installed on ubuntu-latest runners, so no extra setup is needed.

      - name: Build image
        run: docker build -t weather-pipeline:${{ github.sha }} .

Tagging with the commit SHA (${{ github.sha }}) gives every build a unique, traceable tag.

<aside> โŒจ๏ธ Hands on: Add a pytest -q step to your workflow and confirm the job fails when you break a test.

</aside>

Setting up Azure credentials for CI

In the previous chapter you pushed to ACR using az acr login, which uses your personal Azure session. CI runners do not have your browser session, so they need a service principal, an identity created specifically for automation.

A service principal is like a username and password for a machine. It has a JSON block that looks like this:

{
  "clientId": "...",
  "clientSecret": "...",
  "subscriptionId": "...",
  "tenantId": "..."
}

Your teacher will create a service principal for your team with the AcrPush role (permission to push images to the registry) and share the JSON with you.

Store the credentials as a GitHub Secret:

Go to your repository Settings โ†’ Secrets and variables โ†’ Actions โ†’ New repository secret. Name it AZURE_CREDENTIALS and paste the full JSON block as the value.

<aside> โŒจ๏ธ Hands on: Ask your teacher for your team's Azure credentials. Store them as AZURE_CREDENTIALS in your GitHub repository secrets.

</aside>

Pushing to Azure Container Registry from CI

Now you can automate the push. The azure/login action reads your AZURE_CREDENTIALS secret to authenticate, and then Docker commands push to your registry.

      - name: Azure login
        uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      - name: ACR login
        run: az acr login --name hyfregistry
      - name: Build and push
        run: |
          docker build -t hyfregistry.azurecr.io/weather-pipeline:${{ github.sha }} .
          docker push hyfregistry.azurecr.io/weather-pipeline:${{ github.sha }}

<aside> ๐Ÿ“˜ Core program connection: In the Core program you learned about continuous delivery and deployment. Here the "delivery" step is pushing an image to ACR. Review the concept: Core Program - Continuous delivery

</aside>

In Week 6 you will deploy that image to Azure Container Apps so it runs on a schedule or on demand.

Using AI to read CI logs

<aside> ๐Ÿ’ก Using AI to help: Paste a failing CI log and the relevant test file into an LLM and ask for the most likely root cause. (โš ๏ธ Ensure no PII or sensitive company data is included!)

</aside>

Always reproduce the issue locally before committing a fix.

<aside> ๐Ÿค“ Curious Geek: Why CI was invented

Early CI tools like CruiseControl appeared in the early 2000s to reduce integration pain in large codebases. The idea was to merge often and test automatically to avoid "merge hell".

For the full CI lineage from CruiseControl to GitHub Actions, see the optional History of Containers and CI/CD page.

</aside>

Exercises

  1. List three checks you would run in CI for a data pipeline.
  2. Explain why CI should run on pull requests, not only on main.
  3. Add a formatting check to your workflow.
  4. Describe a safe tagging strategy for your pipeline images.
  5. List the secrets your CI pipeline needs to push to ACR.

Knowledge Check

  1. What is the difference between a linting step and a test step in CI?
  2. Why should CI run on pull requests as well as pushes to main?
  3. How should secrets be provided to a CI workflow, and why?
  4. Why should CI push images with commit SHA tags?
  5. Which Azure credentials does CI need to authenticate to ACR?

Extra reading


With the concepts in place, the optional Practice exercises let you rehearse the Dockerfile and CI patterns before tackling the Assignment.


The HackYourFuture curriculum is licensed underย CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with โค๏ธ by the HackYourFuture community ยท Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.