Week 1 - Python Foundations

Python Setup

Data Types and Variables

Control Flow: Logic and Loops

Functions and Modules

Type Hints for Clearer Code

Command-Line Interface Habits

Errors and Debugging

Logging in Python

File Operations

Azure Setup and Account Access

Practice

Week 1 Gotchas & Pitfalls

Week 1 Assignment: The Data Cleaning Pipeline

Career relevance: Week 1 in the NL data job market

Week 1 Glossary

Going Further: Optional Deep Dives

Week 1 Kickoff Slides

Type Hints for Clearer Code

Python is dynamically typed - you don't have to declare variable types. But type hints let you annotate your code with expected types, making it easier to understand and catch bugs early.

<aside> 💡 Type hints are especially valuable in data pipelines where data flows through many functions. They act as documentation and help tools catch type mismatches before runtime.

</aside>

Why Use Type Hints?

Without type hints

def process_data(records, threshold):
    results = []
    for record in records:
        if record["score"] > threshold:
            results.append(record)
    return results

Questions a reader might have:

With type hints

def process_data(records: list[dict], threshold: float) -> list[dict]:
    results = []
    for record in records:
        if record["score"] > threshold:
            results.append(record)
    return results

Now it's clear:

Basic Type Hints

Simple types

# Variables
name: str = "Alice"
age: int = 25
price: float = 19.99
is_active: bool = True

# Functions
def greet(name: str) -> str:
    return f"Hello, {name}!"

def add(a: int, b: int) -> int:
    return a + b

def is_valid(value: str) -> bool:
    return len(value) > 0

None and Optional

# Function that returns None
def print_message(message: str) -> None:
    print(message)

# Function that might return None
def find_user(user_id: int) -> str | None:
    users = {1: "Alice", 2: "Bob"}
    return users.get(user_id)  # Returns None if not found

<aside> ⚠️ The str | None syntax (using |) requires Python 3.10+. For Python 3.9, use Optional[str] from the typing module instead.

</aside>

Collection Types

Lists

list[str] is a generic type: a built-in collection (list) annotated with the type of its contents (str). Since Python 3.9, you spell these with built-in lowercase types; before 3.9 you had to import List, Dict, etc. from typing.

# List of strings
names: list[str] = ["Alice", "Bob", "Charlie"]

# List of integers
scores: list[int] = [85, 92, 78, 95]

# List of dictionaries
records: list[dict] = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30}
]

Dictionaries

# Dictionary with string keys and integer values
ages: dict[str, int] = {"Alice": 25, "Bob": 30}

# Dictionary with string keys and any values
config: dict[str, str | int | bool] = {
    "host": "localhost",
    "port": 8080,
    "debug": True
}

Tuples

# Tuple with specific types for each position
point: tuple[int, int] = (10, 20)
person: tuple[str, int, bool] = ("Alice", 25, True)

Function Type Hints

Multiple parameters

def calculate_discount(
    price: float,
    discount_percent: float,
    min_price: float = 0.0
) -> float:
    """Calculate discounted price with a minimum floor."""
    discounted = price * (1 - discount_percent / 100)
    return max(discounted, min_price)

Functions as parameters

from typing import Callable

def apply_to_all(
    items: list[str],
    transform: Callable[[str], str]
) -> list[str]:
    """Apply a transformation function to all items."""
    return [transform(item) for item in items]

# Usage
names = ["alice", "bob"]
upper_names = apply_to_all(names, str.upper)  # ["ALICE", "BOB"]

Type Hints for Data Pipelines

Here's how type hints improve a data pipeline function:

Before: Unclear data flow

def process_sales(data, min_amount):
    filtered = [r for r in data if r["amount"] >= min_amount]
    total = sum(r["amount"] for r in filtered)
    return {"count": len(filtered), "total": total}

After: Clear data flow

def process_sales(
    data: list[dict[str, float | str]],
    min_amount: float
) -> dict[str, int | float]:
    """
    Filter sales records and calculate summary statistics.

    Args:
        data: List of sales records with 'amount' and other fields.
        min_amount: Minimum amount to include in results.

    Returns:
        Dictionary with 'count' and 'total' keys.
    """
    filtered = [r for r in data if r["amount"] >= min_amount]
    total = sum(r["amount"] for r in filtered)
    return {"count": len(filtered), "total": total}

Type Aliases

For complex types, create type aliases to keep code readable:

# Define type aliases at the top of your module
Record = dict[str, str | int | float | None]
RecordList = list[Record]
TransformFunc = Callable[[Record], Record]

def transform_records(
    records: RecordList,
    transformer: TransformFunc
) -> RecordList:
    """Apply a transformation to each record."""
    return [transformer(record) for record in records]

<aside> 💡 Type aliases are great for data pipelines where you work with the same data structures repeatedly. Define them once and reuse throughout your code.

</aside>

Type Checking with VS Code

The Python extension's Pylance plugin is a static type checker: it reads your annotations without running the code and flags type mismatches as you type. The CLI equivalent is mypy. Both rely on the same hint syntax this chapter teaches; you opt in file-by-file (the design called gradual typing), so partially-annotated code stays valid Python.

Enabling strict type checking

Add to your .vscode/settings.json:

{
    "python.analysis.typeCheckingMode": "basic"
}

Options:

Example: VS Code catching a type error

def calculate_total(prices: list[float]) -> float:
    return sum(prices)

# VS Code will highlight this as an error:
result = calculate_total("not a list")  # Error: str is not list[float]

<aside> ⌨️ Hands on: Open VS Code and create a function get_average(numbers: list[float]) -> float. Try calling it with wrong types and see how VS Code highlights the errors.

</aside>


<aside> 🚀 Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=1&chapter=type_hints&exercise=w1_type_hints__get_average&lang=python

</aside>

Common Patterns in Data Engineering

Processing records

def filter_records(
    records: list[dict[str, str]],
    field: str,
    value: str
) -> list[dict[str, str]]:
    """Filter records where field equals value."""
    return [r for r in records if r.get(field) == value]

Transformation pipeline

def pipeline(
    data: list[dict],
    transformations: list[Callable[[dict], dict]]
) -> list[dict]:
    """Apply a series of transformations to data."""
    result = data
    for transform in transformations:
        result = [transform(record) for record in result]
    return result

File operations

from pathlib import Path

def read_file(filepath: str | Path) -> str:
    """Read and return file contents."""
    with open(filepath, "r") as f:
        return f.read()

When NOT to Use Type Hints

Type hints are optional. Skip them when:

# Type hints not needed here - it's obvious
x = 5
name = "Alice"

# Type hints helpful here - clarifies the function's contract
def process_batch(records: list[dict], batch_size: int = 100) -> list[list[dict]]:
    ...

<aside> 🤓 Curious Geek: Python's gradual-typing arc

Python type hints arrived in PEP 484 in 2014, co-authored by Guido van Rossum and Jukka Lehtosalo (creator of mypy). Their goal was gradual typing: opt in file-by-file so a 2-million-line dynamically-typed codebase like Dropbox or Instagram could migrate without a rewrite. Two later PEPs polished the syntax: PEP 585 (2020) lets you write list[str] instead of List[str] (no more from typing import List), and PEP 604 (2020) lets you write str | None instead of Optional[str]. The runtime still ignores the annotations: tools like mypy and Pylance read them statically. That choice keeps Python's "just write code" feel intact, which is exactly why type hints succeeded where stricter proposals failed.

</aside>

The track expects you to add hints to every public function you write from this point on, so the next reasonable step is to use them on something concrete.

<aside> 📝 Practice: The week's Practice chapter has two exercises that exercise type hints: Ex 1 (the Temperature Logger: annotate c_to_f(celsius: float) -> float) and Ex 4 (Grade Processor: annotate the helpers you split out). Both take a few minutes.

</aside>

🧠 Knowledge Check


<aside> 🚀 Try it in the widget: Interactive Quiz: Type Hints

</aside>

https://lasse.be/simple-hyf-teach-widget/mcq.html?bank=week_1_ch5_type_hints_quiz&embed=1

Extra reading


Next up: Command-Line Interface Habits, where you leave the editor and start running your typed Python scripts from the terminal, picking up the habits (argument parsing, exit codes, stdout vs stderr) that make a script feel like a proper pipeline step.