Week 1 - Python Foundations

Python Setup

Data Types and Variables

Control Flow: Logic and Loops

Functions and Modules

Type Hints for Clearer Code

Command-Line Interface Habits

Errors and Debugging

Logging in Python

File Operations

Azure Setup and Account Access

Practice

Week 1 Gotchas & Pitfalls

Week 1 Assignment: The Data Cleaning Pipeline

Week 1 Glossary

Going Further: Optional Deep Dives

Week 1 Kickoff Slides

🛠️ Practice

These six exercises consolidate everything Week 1 introduced: variables, functions, type hints, logging, debugging, file I/O, and CLI habits. Pick the ones that match what felt shaky on a first read; you can do them in any order.

By the end of this chapter, you should have practiced the core Python skills that the Week 1 assignment in the next chapter combines into one larger task.

Open the workspace once

All six exercises live as subfolders on the same w1 branch of the practice repo. Open it once for the whole week, not once per exercise:

<aside> 💻 Open in GitHub Codespaces

</aside>

Prefer your own editor? Clone locally:

git clone -b w1 <https://github.com/lassebenni/hyf-data-track-python-exercises.git>
cd hyf-data-track-python-exercises
code .

Either way, each exercise is its own subfolder: exercise_1/, exercise_2/, ..., exercise_6/. Switch between them in the file explorer; one cold-start covers the whole week.

Reference solutions (peek only after attempting)

Reference solutions for all six exercises live on a separate branch, w1-solutions. The w1 branch is intentionally starter-only so you do not accidentally peek before you struggle (which is where the learning happens).

When you finish an exercise, or you are genuinely stuck after ~30 minutes, switch to the solutions branch:

git fetch origin
git checkout w1-solutions

On w1-solutions, each starter file (exercise.py, exercise_2.py, exercise_6.py, assignment_6.py) has been filled in with the answer in-place. The original # TODO and # FIXME comments are still there, with the solution code and a # WHY ...: note sitting directly under each one. So the file you read is the question and the answer side-by-side: TODO above, code + commentary below. Read the WHY comments, do not copy the code verbatim. The point of the reference solution is the commentary, not the answer.

Spoiler discipline

If you have any uncommitted edits across the repo, git checkout w1-solutions will refuse with "Your local changes ... would be overwritten by checkout". That's Git protecting your work, not breaking it. Two safe options:

Exercise 1: The Temperature Logger

Concepts: Variables (Ch2), Functions (Ch4), Type hints (Ch5), Logging (Ch8).

You are building a small weather station script. Your task is to write a function that converts Celsius to Fahrenheit, but it must be “production-ready”.

Instructions:

  1. Import the logging module and configure it to level INFO.
  2. Write a function called convert_c_to_f.
  3. Type Hinting: The function should accept a float and return a float.
  4. Inside the function, calculate the result ((celsius * 9/5) + 32).
  5. Logging: Before returning the value, log an info message: "Converting {celsius}°C to {fahrenheit}°F".
  6. Call the function with three different values (e.g., 0, 25, 100).

<aside> 📦 Files: exercise_1/ on the w1 branch (use the Codespace you opened at the top of this page).

</aside>


Exercise 2: The Data Cleaner

Concepts: Lists (Ch2), Loops & conditionals (Ch3), Debugging (Ch7).

You have received a list of user ages from a database, but the data is dirty. Some values are strings, some are numbers, and some are negative (impossible!).

The buggy code:

Copy this into exercise_2.py. It currently crashes.

ages = [25, 30, "40", "not_available", 20, -5]

def calculate_average_age(age_list):
    total = 0
    count = 0
    for age in age_list:
        total += age
        count += 1

    return total / count

print(calculate_average_age(ages))

Instructions:

  1. Run the code and read the Traceback. What kind of error is it?
  2. Use the VS Code Debugger to step through the loop. Find which value causes the crash.
  3. Modify the loop to fix the code:
  1. Print the final correct average.

<aside> 📦 Files: exercise_2/ on the w1 branch (use the Codespace you opened at the top of this page).

</aside>


Exercise 3: The Precision Trap

Concepts: Floating-point math (Ch2), Debugging (Ch7).

You are processing payments for a transaction system. You have a wallet with 0.1 bitcoin, and you receive 0.2 bitcoin. You want to check if you now have exactly 0.3 bitcoin to execute a trade.

The buggy code:

This code prints "Transaction failed?" even though 0.1 + 0.2 should equal 0.3. Even stranger: the wallet balance prints as 0.30000000000000004, not 0.3.

wallet = 0.1
wallet += 0.2

print(f"Wallet balance:{wallet}")

if wallet == 0.3:
    print("Transaction Success!")
else:
    print("Transaction failed?")

Instructions:

  1. Run the code. The print shows Wallet balance:0.30000000000000004 (not 0.3), and the if check fails. Both clues point at the same root cause: Python cannot store 0.1 + 0.2 as exactly 0.3.
  2. Set a breakpoint on the if wallet == 0.3: line.
  3. Start the debugger. Hover your mouse over the wallet variable (or look in the Variables pane).
  4. What is the actual value of wallet?
  5. Computers struggle with exact decimal math. Change the code to use the round() function to fix the comparison (e.g., round to 1 decimal place).

<aside> 📦 Files: exercise_3/ on the w1 branch (use the Codespace you opened at the top of this page).

</aside>


Exercise 4: Grade Processor (Final Boss)

Concepts: Dictionaries (Ch2), Type hints (Ch5), Logging (Ch8), branching logic (Ch3).

Combine everything! You need to process student grades.

Instructions:

  1. Create a dictionary representing a student:
student = {"name": "Alice", "grades": [85, 90, 78]}
  1. Write a function process_student(student_data: dict) -> None.
  2. Configure logging to show timestamps.
  3. Inside the function:
  1. Call the function with Alice's data.

<aside> ⚠️ The order matters. If you write three separate if blocks instead of if / elif / else, an average of 85 will trigger both "Grade A" and "Grade B" because both conditions are true. The elif is what makes the ladder pick exactly one branch.

</aside>

Run your function with Alice's grades and confirm only one log line is emitted per student.

<aside> 📦 Files: exercise_4/ on the w1 branch (use the Codespace you opened at the top of this page).

</aside>


Exercise 5: The File Ingestor

Concepts: File I/O & context managers (Ch9), Strings (Ch2).

A big part of data engineering is reading data from files. Let's practice reading a raw text file and processing it.

Instructions:

  1. Create a text file named raw_data.txt in your folder. Add these lines:
amsterdam
rotterdam
the hague
utrecht
  1. Create a Python script exercise_5.py.
  2. Use the with open(...) pattern to read the raw_data.txt file.
  3. Loop through the lines and capitalize each city name (e.g., "Amsterdam").
  4. Store the cleaned names in a list.
  5. Use with open(...) again to write the cleaned names into a new file called processed_data.txt.
  6. Check your folder to see if the new file was created.

<aside> 📦 Files: exercise_5/ on the w1 branch (use the Codespace you opened at the top of this page).

</aside>


Exercise 6: The Pipeline CLI

Concepts: CLI arguments via argparse (Ch6), Logging levels (Ch8), pathlib (Ch9).

You are given a tiny CSV cleaner that mostly works, but it has three problems any production pipeline would fix on day one:

  1. The input and output paths are hard-coded. A teammate cannot reuse it on a different file.
  2. It always logs at INFO. You cannot quiet it for a clean run, or crank it up to DEBUG when something looks off.
  3. Every message uses logging.info() regardless of what it actually is. Skipped rows should warn; per-row trace should be DEBUG; the summary is INFO.

Goal: make the script runnable like this:

python exercise_6.py --input data/messy_users.csv --output cleaned.json --log-level DEBUG

Instructions: