Week 1 - Foundational Python

Python Setup

Data Types and Variables

Control Flow: Logic and Loops

Functions and Modules

Type Hints for Clearer Code

Command-Line Interface Habits

Errors and Debugging

Logging in Python

File Operations

Azure Setup and Account Access

Practice

Week 1 Assignment: The Data Cleaning Pipeline

Week 1 Gotchas & Pitfalls

Lesson Plan

Type Hints for Clearer Code

Python is dynamically typed - you don't have to declare variable types. But type hints let you annotate your code with expected types, making it easier to understand and catch bugs early.

<aside> 💡 Type hints are especially valuable in data pipelines where data flows through many functions. They act as documentation and help tools catch type mismatches before runtime.

</aside>

Why Use Type Hints?

Without type hints

def process_data(records, threshold):
    results = []
    for record in records:
        if record["score"] > threshold:
            results.append(record)
    return results

Questions a reader might have:

With type hints

def process_data(records: list[dict], threshold: float) -> list[dict]:
    results = []
    for record in records:
        if record["score"] > threshold:
            results.append(record)
    return results

Now it's clear:

Basic Type Hints

Simple types

# Variables
name: str = "Alice"
age: int = 25
price: float = 19.99
is_active: bool = True

# Functions
def greet(name: str) -> str:
    return f"Hello, {name}!"

def add(a: int, b: int) -> int:
    return a + b

def is_valid(value: str) -> bool:
    return len(value) > 0

None and Optional

# Function that returns None
def print_message(message: str) -> None:
    print(message)

# Function that might return None
def find_user(user_id: int) -> str | None:
    users = {1: "Alice", 2: "Bob"}
    return users.get(user_id)  # Returns None if not found

<aside> ⚠️ The str | None syntax (using |) requires Python 3.10+. For Python 3.9, use Optional[str] from the typing module instead.

</aside>

Collection Types

Lists

# List of strings
names: list[str] = ["Alice", "Bob", "Charlie"]

# List of integers
scores: list[int] = [85, 92, 78, 95]

# List of dictionaries
records: list[dict] = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30}
]

Dictionaries

# Dictionary with string keys and integer values
ages: dict[str, int] = {"Alice": 25, "Bob": 30}

# Dictionary with string keys and any values
config: dict[str, str | int | bool] = {
    "host": "localhost",
    "port": 8080,
    "debug": True
}

Tuples

# Tuple with specific types for each position
point: tuple[int, int] = (10, 20)
person: tuple[str, int, bool] = ("Alice", 25, True)

Function Type Hints

Multiple parameters

def calculate_discount(
    price: float,
    discount_percent: float,
    min_price: float = 0.0
) -> float:
    """Calculate discounted price with a minimum floor."""
    discounted = price * (1 - discount_percent / 100)
    return max(discounted, min_price)

Functions as parameters

from typing import Callable

def apply_to_all(
    items: list[str],
    transform: Callable[[str], str]
) -> list[str]:
    """Apply a transformation function to all items."""
    return [transform(item) for item in items]

# Usage
names = ["alice", "bob"]
upper_names = apply_to_all(names, str.upper)  # ["ALICE", "BOB"]

Type Hints for Data Pipelines

Here's how type hints improve a data pipeline function:

Before: Unclear data flow

def process_sales(data, min_amount):
    filtered = [r for r in data if r["amount"] >= min_amount]
    total = sum(r["amount"] for r in filtered)
    return {"count": len(filtered), "total": total}

After: Clear data flow

def process_sales(
    data: list[dict[str, float | str]],
    min_amount: float
) -> dict[str, int | float]:
    """
    Filter sales records and calculate summary statistics.

    Args:
        data: List of sales records with 'amount' and other fields.
        min_amount: Minimum amount to include in results.

    Returns:
        Dictionary with 'count' and 'total' keys.
    """
    filtered = [r for r in data if r["amount"] >= min_amount]
    total = sum(r["amount"] for r in filtered)
    return {"count": len(filtered), "total": total}

Type Aliases

For complex types, create aliases to keep code readable:

# Define type aliases at the top of your module
Record = dict[str, str | int | float | None]
RecordList = list[Record]
TransformFunc = Callable[[Record], Record]

def transform_records(
    records: RecordList,
    transformer: TransformFunc
) -> RecordList:
    """Apply a transformation to each record."""
    return [transformer(record) for record in records]

<aside> 💡 Type aliases are great for data pipelines where you work with the same data structures repeatedly. Define them once and reuse throughout your code.

</aside>

Type Checking with VS Code

VS Code with the Python extension (and Pylance) automatically checks types as you write code.

Enabling strict type checking

Add to your .vscode/settings.json:

{
    "python.analysis.typeCheckingMode": "basic"
}

Options:

Example: VS Code catching a type error

def calculate_total(prices: list[float]) -> float:
    return sum(prices)

# VS Code will highlight this as an error:
result = calculate_total("not a list")  # Error: str is not list[float]

<aside> ⌨️ Hands-on: Open VS Code and create a function get_average(numbers: list[float]) -> float. Try calling it with wrong types and see how VS Code highlights the errors.

</aside>

Common Patterns in Data Engineering

Processing records

def filter_records(
    records: list[dict[str, str]],
    field: str,
    value: str
) -> list[dict[str, str]]:
    """Filter records where field equals value."""
    return [r for r in records if r.get(field) == value]

Transformation pipeline

def pipeline(
    data: list[dict],
    transformations: list[Callable[[dict], dict]]
) -> list[dict]:
    """Apply a series of transformations to data."""
    result = data
    for transform in transformations:
        result = [transform(record) for record in result]
    return result

File operations

from pathlib import Path

def read_file(filepath: str | Path) -> str:
    """Read and return file contents."""
    with open(filepath, "r") as f:
        return f.read()

When NOT to Use Type Hints

Type hints are optional. Skip them when:

# Type hints not needed here - it's obvious
x = 5
name = "Alice"

# Type hints helpful here - clarifies the function's contract
def process_batch(records: list[dict], batch_size: int = 100) -> list[list[dict]]:
    ...

🧠 Knowledge Check

  1. Do Python type hints actually prevent you from passing the wrong type when the code runs?
  2. How would you type hint a function argument that can be either a str or None?
  3. Why are type hints especially useful when working with data pipelines in a team?

‘Extra reading


Next lesson: CLI Habits