🗓️ Lesson Plan

Theme: From Local to External

Welcome to Week 3! Students have read the material and built clean pipelines in Week 2. Today the focus shifts from "processing local data" to "ingesting external data safely." The goal is hands-on practice: fetch from an API, validate with Pydantic, and store in SQLite - all in class.

Goals

By the end of this lesson, students should be able to:

Identify why external data is harder to work with than local files
Implement retry logic with exponential backoff for API calls
Validate incoming data using Pydantic models with constraints
Write validated data to SQLite with parameterized queries and upserts
Understand the Week 3 Assignment requirements and how each task maps to a chapter

Schedule

Time	Activity	Duration
0:00	Welcome & Kahoot Quiz	15 min
0:15	Live Demo: The Broken Ingestion Script	15 min
0:30	Workshop 1: API Fetching + Retry Logic	25 min
0:55	Break	10 min
1:05	Workshop 2: Pydantic Validation + SQLite Storage	30 min
1:35	Assignment Launch: Connecting the Dots	15 min
1:50	Q&A & Wrap Up	10 min
2:00	End	-

Total: 2 hours

Kahoot Quiz (15 min)

Goal: Check understanding of the Week 3 reading material before diving in.

Topics to include

Ingestion: What is the difference between local and external data?
Error types: Is a 500 Server Error transient or permanent?
Retry: What does exponential backoff mean? (Wait 1s, 2s, 4s, 8s...)
HTTP: What does status code 429 mean? (Too Many Requests)
CSV gotcha: What type is every value in a CSV file? (String)
Pydantic: What happens when you pass "18.5" (a string) to a float field in Pydantic?
SQL safety: Why should you never use f-strings in SQL queries?
Idempotency: What does an upsert do when the record already exists?

Live Demo: The Broken Ingestion Script (15 min)

Start the class with this script on screen. Ask: "What happens when this script runs at 3 AM and the API is down?"

import requests
import json

data = requests.get(
    "<https://api.open-meteo.com/v1/forecast?latitude=55.67&longitude=12.56&hourly=temperature_2m>"
).json()

readings = []
for i in range(len(data["hourly"]["time"])):
    readings.append({
        "time": data["hourly"]["time"][i],
        "temp": data["hourly"]["temperature_2m"][i],
    })

with open("weather.json", "w") as f:
    json.dump(readings, f)

print(f"Saved {len(readings)} readings")

Teaching Points (do these live)

Disconnect from WiFi (or use a bad URL) and run it. It crashes with ConnectionError. Ask: "How do we survive this?" (Answer: retry with backoff)
Ask: "What if one temperature value is null?" It gets saved to the file silently. (Answer: validation)
Ask: "What if you run this twice?" The file gets overwritten. All history is lost. (Answer: database with upserts)
Ask: "Where is the timeout?" There is none. If the API hangs, the script hangs forever. (Answer: timeout=10)
Ask: "What if a partner sends the same data as a CSV with different field names?" The script can only read this one API. (Answer: normalization)

Count the problems together. This becomes the motivation for every chapter.

Workshop 1: API Fetching + Retry Logic (25 min)

Goal: Students build a resilient API fetcher with retry logic. This covers Chapters 2 and 3 (APIs and Error Handling).

Part A: Basic API Fetch (10 min)

Task: Fetch weather data from Open-Meteo and transform it into a list of dictionaries.

Instructions for students:

Use requests.get() with timeout=10 and params={} (not hardcoded URL)
Call response.raise_for_status() to catch HTTP errors
Transform the response into a list of dicts with keys: station, timestamp, temperature_c, humidity_pct
Print the first 3 records

Success criteria: Running the script prints 3 weather readings for Copenhagen.

Part B: Add Retry Logic (15 min)

Task: Wrap the fetch in a retry loop with exponential backoff.

Instructions for students:

Add a for attempt in range(max_retries) loop around the request
Catch ConnectionError and Timeout - these are retryable
Wait 2 ** attempt seconds between retries
After the final attempt, return an empty list (do not crash the pipeline)
Test by temporarily using a bad URL, then switching back

Key moment: Have students test with https://httpstat.us/500 to see the retry behavior in action. They should see the exponential delays in real time.

Discussion: "Why wait 1, 2, 4 seconds instead of 1, 1, 1?" (Because if the server is overloaded, constant retries make it worse)

Workshop 2: Pydantic Validation + SQLite Storage (30 min)

Goal: Students validate data with Pydantic and store it in SQLite. This covers Chapters 5 and 6.

Part A: Pydantic Model (10 min)

Task: Create a Pydantic model for weather readings.

Instructions for students:

Create a WeatherReading model with station (str), temperature_c (float), humidity_pct (int, 0-100), timestamp (str)
Add Field(ge=0, le=100) constraint on humidity_pct
Add a @field_validator for station that strips whitespace
Try creating a WeatherReading(station="", temperature_c="abc", humidity_pct=150, timestamp="bad") and observe the error message

Key moment: Show the detailed ValidationError output. Compare it to the cryptic ValueError they would get from a raw dataclass. Pydantic tells you exactly what is wrong with each field.

Part B: Batch Validation (5 min)

Task: Validate a list of records, separating valid from invalid.

Instructions for students:

Write a validate_batch function that loops through records
Try/except ValidationError for each record
Collect valid records in one list, error details in another
Test with a mix of good and bad records