Content

🛠️ Practice

These exercises reinforce the core skills from this week. Each one is short and focused - complete them before starting the assignment. They build on each other, so work through them in order.

Open the workspace once

All Week 3 exercises live under data-track/week-3/ in HYF's Learning-Resources repo. One Codespace covers all six exercises.

<aside> 💻 Open in GitHub Codespaces

</aside>

The repo's .devcontainer/data-track/ boots Python 3.11 + ruff + Pylance for every exercise. From the Codespace's Explorer, navigate into data-track/week-3/exercise_N/.

Prefer your own VS Code? Clone locally instead:

git clone <https://github.com/HackYourFuture/Learning-Resources.git>
cd Learning-Resources/data-track/week-3
code .

Each exercise folder ships its own requirements.txt (when needed) and a per-exercise README with detailed instructions.

Reference solutions (peek only after attempting)

Each exercise_N/solutions/ folder holds the answer in-place. The starter file is filled with the answer code, the original # TODO comments are preserved, and # WHY ...: notes sit under each non-obvious choice.

Read the WHY notes, not the code. The point is the reasoning, not the syntax.

Spoiler discipline

The solution sits next to your starter under solutions/ rather than on a separate branch. The folder name and the deliberate "open this folder to see the answer" click are the whole barrier, and they are enough. Time-box yourself: 10-30 minutes of honest attempt before you open solutions/. The struggle is where the learning happens.

You can diff your attempt against the reference once you have tried:

diff exercise_1/exercise.py exercise_1/solutions/exercise.py

Exercise 1: Write a Retry Function

This function crashes on the first failure. Your job: make it resilient.

# BAD: no retry, crashes immediately ❌
import requests

def fetch_data(url: str) -> dict:
    response = requests.get(url)
    response.raise_for_status()
    return response.json()

Your task:

Add a max_retries parameter (default 3)
Wrap the request in a retry loop with exponential backoff (2 ** attempt seconds)
Only retry on ConnectionError and Timeout exceptions
After the final attempt, raise the original exception
Print a message before each retry: "Attempt {n} failed, retrying in {wait}s..."

Success criteria: If you call fetch_data("<https://httpstat.us/500>"), it retries 3 times with increasing delays, then raises an exception.

<aside> 📦 Files: exercise_1/: use the Codespace you opened at the top of this page.

</aside>

Exercise 2: Paginated API Fetch

Write a function that fetches all results from a paginated API endpoint.

Your task:

Create a function fetch_all_pages(base_url: str) -> list[dict]
The API returns {"results": [...], "page": 1, "total_pages": 5}
Loop through pages until page >= total_pages
Collect all results into a single list
Add a 0.5-second delay between requests to be polite

Use this test URL to practice: https://api.open-meteo.com/v1/forecast?latitude=55.67&longitude=12.56&hourly=temperature_2m&forecast_days=1

Success criteria: The function returns a flat list of all records across all pages. For a single-page API, it returns the results from that one page.

<aside> 📦 Files: exercise_2/: includes a local 3-page stub so you can verify offline before pointing at a real API.

</aside>

<aside> 💡 Exercises 1-2 cover the "fetch" side of ingestion. Exercises 3-5 cover the "validate and store" side. Exercise 6 combines everything.

</aside>

Exercise 3: Read and Normalize File Formats

You receive weather data in three formats. Normalize them all to the same shape.

Your task:

Create a file data/stations.csv with columns: station_name, temp, humidity, date
Create a file data/readings.json containing a list of objects with keys: station, temperature_c, humidity_pct, timestamp
Write a read_csv_file(path) function that reads the CSV
Write a read_json_file(path) function that reads the JSON
Write a normalize_record(record: dict) -> dict function that outputs:

   {"station": str, "temperature_c": float, "humidity_pct": int, "timestamp": str}

regardless of the input field names

Success criteria: Both normalize_record(csv_row) and normalize_record(json_row) return dictionaries with the same four keys.

<aside> 📦 Files: exercise_3/: ships sample data/stations.csv and data/readings.json with deliberately different field names.

</aside>

Exercise 4: Pydantic Validation

Create a Pydantic model and validate a batch of records.

Your task:

Create a WeatherReading model with:

station: str, minimum length 1
temperature_c: float, between -90 and 60
humidity_pct: int, between 0 and 100
timestamp: str

Add a @field_validator for station that strips whitespace and converts to title case
Write a function validate_batch(records: list[dict]) -> tuple[list[WeatherReading], list[dict]] that returns valid records and error details
Test with this data:

test_data = [
    {"station": "copenhagen", "temperature_c": "18.5", "humidity_pct": "72", "timestamp": "2025-01-15T10:00"},
    {"station": "", "temperature_c": "abc", "humidity_pct": "150", "timestamp": "bad"},
    {"station": "  AARHUS  ", "temperature_c": "15.2", "humidity_pct": "65", "timestamp": "2025-01-15T11:00"},
]

Success criteria: validate_batch(test_data) returns 2 valid records (Copenhagen and Aarhus, both title-cased) and 1 error. The error detail includes which fields failed.

<aside> 📦 Files: exercise_4/: use the Codespace you opened at the top of this page.

</aside>

Exercise 5: SQLite Writer

Store validated weather data in a SQLite database.

Your task:

Create a function create_table(db_path: str) that creates a weather_readings table with columns: station, timestamp, temperature_c, humidity_pct, plus a UNIQUE(station, timestamp) constraint
Create a function upsert_readings(db_path: str, readings: list[dict]) that inserts records using ON CONFLICT ... DO UPDATE SET
Create a function query_by_station(db_path: str, station: str) -> list[dict] that returns all readings for a station
Use parameterized queries (? placeholders) for all SQL operations

Success criteria: Insert 3 records, then re-insert the same records with updated temperatures. Query the database and verify only 3 rows exist (not 6), with the updated temperatures.

<aside> 📦 Files: exercise_5/: SQLite ships with Python, no extra install. The generated weather.db is gitignored.

</aside>

<aside> ⚠️ If Exercise 5 feels hard, make sure your Pydantic model from Exercise 4 works first. The database only stores data that passes validation.

</aside>

Exercise 6: Mini Pipeline

Combine all the pieces into a small end-to-end pipeline.

Your task:

Fetch weather data from the Open-Meteo API for Copenhagen (latitude 55.67, longitude 12.56)
Normalize the API response into a list of dictionaries with keys: station, timestamp, temperature_c, humidity_pct
Validate each record with your Pydantic WeatherReading model
Store valid records in a SQLite database using your upsert_readings function
Print a summary: total fetched, valid count, error count, rows in database

# Your pipeline should look something like this:
raw_records = fetch_weather(latitude=55.67, longitude=12.56, days=1)
valid, errors = validate_batch(raw_records)
upsert_readings("weather.db", [r.model_dump() for r in valid])
print(f"Fetched: {len(raw_records)}, Valid: {len(valid)}, Errors: {len(errors)}")

Success criteria: Running the script twice produces the same number of rows in the database (upserts, not duplicates). The summary shows 0 errors for clean API data.

<aside> 📦 Files: exercise_6/: capstone, uses requests + pydantic + sqlite3. The Open-Meteo call needs network (no API key).

</aside>

Once Exercise 6 runs cleanly, you have built a full ingestion pipeline.

<aside> 💡 Using AI to help: If you get stuck on an exercise, paste the error message and your code (⚠️ Ensure no PII or sensitive company data is included!) into an LLM. Ask it to explain what went wrong and suggest a fix. This is especially useful for debugging SQL syntax and Pydantic validation errors.

</aside>