⚠️ Week 1 Gotchas & Pitfalls

Welcome to the "Gotchas" section. These are the subtle traps that every Python developer falls into at least once. We're telling you now so you can spot them when (not if) they happen to you.

1. The Mutable Default Argument

This is the most famous Python gotcha.

The Misconception

You think that list=[] in a function argument creates a new empty list every time you call the function.

The Reality

Python creates the default object once when the function is defined, not when it is called. If you modify that object, the change persists for the next call!

<aside> 🎬 Terminal Tutorial: Mutable Defaults

</aside>

https://gist.githack.com/lassebenni/775e3f38218372cf3236a8b3d923a794/raw/mutable_defaults_terminal.html

Example

# BAD ❌
def add_student(name, students=[]):
    students.append(name)
    return students

print(add_student("Alice"))  # ['Alice'] - Good
print(add_student("Bob"))    # ['Alice', 'Bob'] - WAIT WHAT? (Alice is still there!)

# GOOD ✅
def add_student(name, students=None):
    if students is None:
        students = []  # Create a new list inside the function
    students.append(name)
    return students

2. The Precision Trap (Floats)

The Misconception

You think 0.1 + 0.2 equals 0.3.

The Reality

Computers store decimals in binary (base 2), which cannot perfectly represent some fractions like 0.1.

<aside> 🎬 Terminal Tutorial: Float Precision

</aside>

https://gist.githack.com/lassebenni/82ff299dbeffcb33efdcb7cbf28ccfa6/raw/float_precision_terminal.html

Example

print(0.1 + 0.2 == 0.3)  # False!
print(0.1 + 0.2)         # 0.30000000000000004

<aside> ⚠️ Data Engineering Rule: Never use float types for money. Use integers (cents) or the decimal module.

</aside>

3. Shadowing Built-in Names

The Misconception

You need a variable to store a list of users, so you call it list.

The Reality

You just overwrote Python's built-in list() function. Now you can't create new lists or convert things to lists anymore.

<aside> 🎬 Terminal Tutorial: Shadowing Built-ins

</aside>

https://gist.githack.com/lassebenni/88a4647651d034d1f39e5e2279a9985f/raw/shadowing_terminal.html

Example

# BAD ❌
list = [1, 2, 3]  # You just killed the list() function
my_tuple = (4, 5)
numbers = list(my_tuple)  # TypeError: 'list' object is not callable

Common names to avoid: list, str, dict, set, type, id, min, max, sum.

4. The Path Separator Trap

The Misconception

You hardcode file paths using slashes, like data/file.csv or data\file.csv.

The Reality

Windows uses backslashes \. Mac/Linux use forward slashes /. If you hardcode them, your code breaks on other operating systems.

Example

# BAD ❌
filename = "data\\exports\\users.csv"  # Breaks on Mac/Linux

# GOOD ✅
import os
filename = os.path.join("data", "exports", "users.csv")

# BETTER (Python 3) ✅
from pathlib import Path
filename = Path("data") / "exports" / "users.csv"

5. The Reference Trap

The Misconception

You think list_b = list_a creates a second, independent list.

The Reality

Both variables point to the same object in memory. If you change one, you change both.

<aside> 🎬 Terminal Tutorial: Reference Trap

</aside>

https://gist.githack.com/lassebenni/18b0b7ab9af7e68def9c79233fbc3f46/raw/reference_trap_terminal.html

Example 1: Variable Assignment

# BAD ❌
original = [1, 2, 3]
copy = original  # This is NOT a copy!
copy.append(4)

print(original) # [1, 2, 3, 4] - The original was changed too!

# GOOD ✅
original = [1, 2, 3]
copy = original.copy()  # Or: list(original) or original[:]
copy.append(4)

print(original) # [1, 2, 3] - Safe!

Example 2: Function Arguments (Pass-by-Reference)

This is even more dangerous when passing lists to functions.

# BAD ❌
def add_bonus(scores):
    scores.append(10)  # Modifies the ORIGINAL list outside the function!

my_scores = [90, 80]
add_bonus(my_scores)
print(my_scores)  # [90, 80, 10] - The original list was changed!

# GOOD ✅
def add_bonus(scores):
    new_scores = scores.copy()  # Create a local copy first
    new_scores.append(10)
    return new_scores

my_scores = [90, 80]
new_scores = add_bonus(my_scores)
print(my_scores)  # [90, 80] - Safe!

6. The Import Side-Effect

The Misconception

You think import my_utils only gives you access to the functions inside it.

The Reality

Python executes the entire file when you import it. If you have "loose" print statements or file writes at the bottom of your utility script, they will run every time you import it!

<aside> 🎬 Terminal Tutorial: Import Side-effects

</aside>

https://gist.githack.com/lassebenni/de0bd1367c3081c42eb314b31c378399/raw/import_side_effects_terminal.html

Example

# my_utils.py
def clean(val):
    return val.strip()

# LOOSE CODE - This runs on every import!
print("Cleanup started...")

The Fix: Always wrap your "running" code in an if __name__ == "__main__": block.

7. Falsy Surprises (None vs 0)

The Misconception

Using if not value: is a safe way to check if data is missing or empty.

The Reality

In Python, numbers like 0 and 0.0 are considered "falsy". If you are cleaning numeric data (like prices or counts), if not value: will trigger for a price of zero, even though zero is a perfectly valid number!

<aside> 🎬 Terminal Tutorial: Falsy Surprises

</aside>

https://gist.githack.com/lassebenni/e4d7adbc87687c2731247cb8c68e0836/raw/falsy_surprises_terminal.html

Example

# BAD ❌ (drops zeros)
def process_price(price):
    if not price:  # Triggers for 0.0!
        return "MISSING"
    return f"${price}"

print(process_price(0.0))  # "MISSING" - WRONG!

The Fix: Specifically check for None or the expected type.

# GOOD ✅
def process_price(price):
    if price is None:
        return "MISSING"
    return f"${price}"

8. The Silent Failure (Print vs Logging)

The Misconception

You use print("Error happened") to debug your code.

The Reality

print() outputs to stdout (Standard Output), which is often discarded in production environments (like Docker containers or Cloud Functions). If your script fails at 3 AM, your print statements are gone forever.

Example

# BAD ❌
try:
    process_data()
except Exception as e:
    print(f"Error: {e}")  # Lost in the void

# GOOD ✅
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

try:
    process_data()
except Exception as e:
    logger.error("Processing failed", exc_info=True)  # Saved with stack trace!

9. The Automation Killer (Input vs Argparse)

The Misconception

You use filename = input("Enter filename: ") to make your script interactive.

The Reality

Data Engineering scripts are almost never run by humans. They are run by Scheduler Robots (cron, Airflow, GitHub Actions). Robots cannot type on a keyboard. Your script will hang forever waiting for input.

Example

# BAD ❌ (Blocks automation)
filename = input("Which file? ")

# GOOD ✅ (Accepts arguments)
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--file", help="Path to the CSV file")
args = parser.parse_args()

filename = args.file

10. The CSV String Trap (everything is text until you convert it)

The Misconception

A column called age in your CSV contains numbers, so you can do row["age"] + 1 directly.

The Reality

CSV is a text format. JSON booleans survive parsing, but every value coming out of csv.DictReader, csv.reader, or a raw HTTP response body is a string until you explicitly cast it. Forgetting to convert is the #1 day-one bug in the Week 1 assignment, and the error messages can look scary the first time you see them.

Example

# BAD ❌ (TypeError on every row)
import csv
with open("users.csv") as f:
    for row in csv.DictReader(f):
        next_year = row["age"] + 1   # TypeError: can only concatenate str (not "int") to str

# GOOD ✅ (cast at the boundary)
import csv
with open("users.csv") as f:
    for row in csv.DictReader(f):
        next_year = int(row["age"]) + 1

⚠️ Week 1 Gotchas & Pitfalls

1. The Mutable Default Argument

The Misconception

The Reality

Example

2. The Precision Trap (Floats)

The Misconception

The Reality

Example

3. Shadowing Built-in Names

The Misconception

The Reality

Example

4. The Path Separator Trap

The Misconception

The Reality

Example

5. The Reference Trap

The Misconception

The Reality

Example 1: Variable Assignment

Example 2: Function Arguments (Pass-by-Reference)

6. The Import Side-Effect

The Misconception

The Reality

Example

7. Falsy Surprises (None vs 0)

The Misconception

The Reality

Example

8. The Silent Failure (Print vs Logging)

The Misconception

The Reality

Example

9. The Automation Killer (Input vs Argparse)

The Misconception

The Reality

Example

10. The CSV String Trap (everything is text until you convert it)

The Misconception

The Reality

Example

The three sub-traps that bite right after