Python is dynamically typed - you don't have to declare variable types. But type hints let you annotate your code with expected types, making it easier to understand and catch bugs early.
💡 Type hints are especially valuable in data pipelines where data flows through many functions. They act as documentation and help tools catch type mismatches before runtime.
def process_data(records, threshold):
results = []
for record in records:
if record["score"] > threshold:
results.append(record)
return results
Questions a reader might have:
records? A list? A dictionary?threshold? Integer? Float?def process_data(records: list[dict], threshold: float) -> list[dict]:
results = []
for record in records:
if record["score"] > threshold:
results.append(record)
return results
Now it's clear:
records is a list of dictionariesthreshold is a float# Variables
name: str = "Alice"
age: int = 25
price: float = 19.99
is_active: bool = True
# Functions
def greet(name: str) -> str:
return f"Hello, {name}!"
def add(a: int, b: int) -> int:
return a + b
def is_valid(value: str) -> bool:
return len(value) > 0
# Function that returns None
def print_message(message: str) -> None:
print(message)
# Function that might return None
def find_user(user_id: int) -> str | None:
users = {1: "Alice", 2: "Bob"}
return users.get(user_id) # Returns None if not found
⚠️ The str | None syntax (using |) requires Python 3.10+. For Python 3.9, use Optional[str] from the typing module instead.
# List of strings
names: list[str] = ["Alice", "Bob", "Charlie"]
# List of integers
scores: list[int] = [85, 92, 78, 95]
# List of dictionaries
records: list[dict] = [
{"name": "Alice", "age": 25},
{"name": "Bob", "age": 30}
]
# Dictionary with string keys and integer values
ages: dict[str, int] = {"Alice": 25, "Bob": 30}
# Dictionary with string keys and any values
config: dict[str, str | int | bool] = {
"host": "localhost",
"port": 8080,
"debug": True
}
# Tuple with specific types for each position
point: tuple[int, int] = (10, 20)
person: tuple[str, int, bool] = ("Alice", 25, True)
def calculate_discount(
price: float,
discount_percent: float,
min_price: float = 0.0
) -> float:
"""Calculate discounted price with a minimum floor."""
discounted = price * (1 - discount_percent / 100)
return max(discounted, min_price)
from typing import Callable
def apply_to_all(
items: list[str],
transform: Callable[[str], str]
) -> list[str]:
"""Apply a transformation function to all items."""
return [transform(item) for item in items]
# Usage
names = ["alice", "bob"]
upper_names = apply_to_all(names, str.upper) # ["ALICE", "BOB"]
Here's how type hints improve a data pipeline function:
def process_sales(data, min_amount):
filtered = [r for r in data if r["amount"] >= min_amount]
total = sum(r["amount"] for r in filtered)
return {"count": len(filtered), "total": total}
def process_sales(
data: list[dict[str, float | str]],
min_amount: float
) -> dict[str, int | float]:
"""
Filter sales records and calculate summary statistics.
Args:
data: List of sales records with 'amount' and other fields.
min_amount: Minimum amount to include in results.
Returns:
Dictionary with 'count' and 'total' keys.
"""
filtered = [r for r in data if r["amount"] >= min_amount]
total = sum(r["amount"] for r in filtered)
return {"count": len(filtered), "total": total}
For complex types, create aliases to keep code readable:
# Define type aliases at the top of your module
Record = dict[str, str | int | float | None]
RecordList = list[Record]
TransformFunc = Callable[[Record], Record]
def transform_records(
records: RecordList,
transformer: TransformFunc
) -> RecordList:
"""Apply a transformation to each record."""
return [transformer(record) for record in records]
💡 Type aliases are great for data pipelines where you work with the same data structures repeatedly. Define them once and reuse throughout your code.
VS Code with the Python extension (and Pylance) automatically checks types as you write code.
Add to your .vscode/settings.json:
{
"python.analysis.typeCheckingMode": "basic"
}
Options:
"off": No type checking"basic": Catch common errors (recommended to start)"strict": Comprehensive checkingdef calculate_total(prices: list[float]) -> float:
return sum(prices)
# VS Code will highlight this as an error:
result = calculate_total("not a list") # Error: str is not list[float]
⌨️ Hands-on: Open VS Code and create a function get_average(numbers: list[float]) -> float. Try calling it with wrong types and see how VS Code highlights the errors.
def filter_records(
records: list[dict[str, str]],
field: str,
value: str
) -> list[dict[str, str]]:
"""Filter records where field equals value."""
return [r for r in records if r.get(field) == value]
def pipeline(
data: list[dict],
transformations: list[Callable[[dict], dict]]
) -> list[dict]:
"""Apply a series of transformations to data."""
result = data
for transform in transformations:
result = [transform(record) for record in result]
return result
from pathlib import Path
def read_file(filepath: str | Path) -> str:
"""Read and return file contents."""
with open(filepath, "r") as f:
return f.read()
Type hints are optional. Skip them when:
# Type hints not needed here - it's obvious
x = 5
name = "Alice"
# Type hints helpful here - clarifies the function's contract
def process_batch(records: list[dict], batch_size: int = 100) -> list[list[dict]]:
...
str or None?Next lesson: CLI Habits