Week 1 -Foundational Python

Python Setup

Data Types and Variables

Control Flow

Functions and Modules

Type Hinting

CLI Habits

Errors and Debugging

Logging in Python

File Operations

[Cloud] Azure Setup

Gotchas & Pitfalls

Practice

Assignment

Back to Track

Type Hints for Clearer Code

Python is dynamically typed - you don't have to declare variable types. But type hints let you annotate your code with expected types, making it easier to understand and catch bugs early.

💡 Type hints are especially valuable in data pipelines where data flows through many functions. They act as documentation and help tools catch type mismatches before runtime.

Why Use Type Hints?

Without type hints

def process_data(records, threshold):
    results = []
    for record in records:
        if record["score"] > threshold:
            results.append(record)
    return results

Questions a reader might have:

With type hints

def process_data(records: list[dict], threshold: float) -> list[dict]:
    results = []
    for record in records:
        if record["score"] > threshold:
            results.append(record)
    return results

Now it's clear:

Basic Type Hints

Simple types

# Variables
name: str = "Alice"
age: int = 25
price: float = 19.99
is_active: bool = True

# Functions
def greet(name: str) -> str:
    return f"Hello, {name}!"

def add(a: int, b: int) -> int:
    return a + b

def is_valid(value: str) -> bool:
    return len(value) > 0

None and Optional

# Function that returns None
def print_message(message: str) -> None:
    print(message)

# Function that might return None
def find_user(user_id: int) -> str | None:
    users = {1: "Alice", 2: "Bob"}
    return users.get(user_id)  # Returns None if not found

⚠️ The str | None syntax (using |) requires Python 3.10+. For Python 3.9, use Optional[str] from the typing module instead.

Collection Types

Lists

# List of strings
names: list[str] = ["Alice", "Bob", "Charlie"]

# List of integers
scores: list[int] = [85, 92, 78, 95]

# List of dictionaries
records: list[dict] = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30}
]

Dictionaries

# Dictionary with string keys and integer values
ages: dict[str, int] = {"Alice": 25, "Bob": 30}

# Dictionary with string keys and any values
config: dict[str, str | int | bool] = {
    "host": "localhost",
    "port": 8080,
    "debug": True
}

Tuples

# Tuple with specific types for each position
point: tuple[int, int] = (10, 20)
person: tuple[str, int, bool] = ("Alice", 25, True)

Function Type Hints

Multiple parameters

def calculate_discount(
    price: float,
    discount_percent: float,
    min_price: float = 0.0
) -> float:
    """Calculate discounted price with a minimum floor."""
    discounted = price * (1 - discount_percent / 100)
    return max(discounted, min_price)

Functions as parameters

from typing import Callable

def apply_to_all(
    items: list[str],
    transform: Callable[[str], str]
) -> list[str]:
    """Apply a transformation function to all items."""
    return [transform(item) for item in items]

# Usage
names = ["alice", "bob"]
upper_names = apply_to_all(names, str.upper)  # ["ALICE", "BOB"]

Type Hints for Data Pipelines

Here's how type hints improve a data pipeline function:

Before: Unclear data flow

def process_sales(data, min_amount):
    filtered = [r for r in data if r["amount"] >= min_amount]
    total = sum(r["amount"] for r in filtered)
    return {"count": len(filtered), "total": total}

After: Clear data flow

def process_sales(
    data: list[dict[str, float | str]],
    min_amount: float
) -> dict[str, int | float]:
    """
    Filter sales records and calculate summary statistics.

    Args:
        data: List of sales records with 'amount' and other fields.
        min_amount: Minimum amount to include in results.

    Returns:
        Dictionary with 'count' and 'total' keys.
    """
    filtered = [r for r in data if r["amount"] >= min_amount]
    total = sum(r["amount"] for r in filtered)
    return {"count": len(filtered), "total": total}

Type Aliases

For complex types, create aliases to keep code readable:

# Define type aliases at the top of your module
Record = dict[str, str | int | float | None]
RecordList = list[Record]
TransformFunc = Callable[[Record], Record]

def transform_records(
    records: RecordList,
    transformer: TransformFunc
) -> RecordList:
    """Apply a transformation to each record."""
    return [transformer(record) for record in records]

💡 Type aliases are great for data pipelines where you work with the same data structures repeatedly. Define them once and reuse throughout your code.

Type Checking with VS Code

VS Code with the Python extension (and Pylance) automatically checks types as you write code.

Enabling strict type checking

Add to your .vscode/settings.json:

{
    "python.analysis.typeCheckingMode": "basic"
}

Options:

Example: VS Code catching a type error

def calculate_total(prices: list[float]) -> float:
    return sum(prices)

# VS Code will highlight this as an error:
result = calculate_total("not a list")  # Error: str is not list[float]

⌨️ Hands-on: Open VS Code and create a function get_average(numbers: list[float]) -> float. Try calling it with wrong types and see how VS Code highlights the errors.

Common Patterns in Data Engineering

Processing records

def filter_records(
    records: list[dict[str, str]],
    field: str,
    value: str
) -> list[dict[str, str]]:
    """Filter records where field equals value."""
    return [r for r in records if r.get(field) == value]

Transformation pipeline

def pipeline(
    data: list[dict],
    transformations: list[Callable[[dict], dict]]
) -> list[dict]:
    """Apply a series of transformations to data."""
    result = data
    for transform in transformations:
        result = [transform(record) for record in result]
    return result

File operations

from pathlib import Path

def read_file(filepath: str | Path) -> str:
    """Read and return file contents."""
    with open(filepath, "r") as f:
        return f.read()

When NOT to Use Type Hints

Type hints are optional. Skip them when:

# Type hints not needed here - it's obvious
x = 5
name = "Alice"

# Type hints helpful here - clarifies the function's contract
def process_batch(records: list[dict], batch_size: int = 100) -> list[list[dict]]:
    ...

🧠 Knowledge Check

  1. Do Python type hints actually prevent you from passing the wrong type when the code runs?
  2. How would you type hint a function argument that can be either a str or None?
  3. Why are type hints especially useful when working with data pipelines in a team?

‘Extra reading


Next lesson: CLI Habits