Week 1 -Foundational Python

Python Setup

Data Types and Variables

Control Flow

Functions and Modules

Type Hinting

CLI Habits

Errors and Debugging

Logging in Python

File Operations

[Cloud] Azure Setup

Gotchas & Pitfalls

Practice

Assignment

Back to Track

⚠️ Week 1 Gotchas & Pitfalls

Welcome to the "Gotchas" section. These are the subtle traps that every Python developer falls into at least once. We're telling you now so you can spot them when (not if) they happen to you.

1. The Mutable Default Argument

This is the most famous Python gotcha.

The Misconception

You think that list=[] in a function argument creates a new empty list every time you call the function.

The Reality

Python creates the default object once when the function is defined, not when it is called. If you modify that object, the change persists for the next call!

🎬 Terminal Tutorial: Mutable Defaults

Example

# BAD ❌
def add_student(name, students=[]):
    students.append(name)
    return students

print(add_student("Alice"))  # ['Alice'] - Good
print(add_student("Bob"))    # ['Alice', 'Bob'] - WAIT WHAT? (Alice is still there!)

# GOOD ✅
def add_student(name, students=None):
    if students is None:
        students = []  # Create a new list inside the function
    students.append(name)
    return students

2. The Precision Trap (Floats)

The Misconception

You think 0.1 + 0.2 equals 0.3.

The Reality

Computers store decimals in binary (base 2), which cannot perfectly represent some fractions like 0.1.

🎬 Terminal Tutorial: Float Precision

Example

print(0.1 + 0.2 == 0.3)  # False!
print(0.1 + 0.2)         # 0.30000000000000004

⚠️ Data Engineering Rule: Never use float types for money. Use integers (cents) or the decimal module.

3. Shadowing Built-in Names

The Misconception

You need a variable to store a list of users, so you call it list.

The Reality

You just overwrote Python's built-in list() function. Now you can't create new lists or convert things to lists anymore.

🎬 Terminal Tutorial: Shadowing Built-ins

Example

# BAD ❌
list = [1, 2, 3]  # You just killed the list() function
my_tuple = (4, 5)
numbers = list(my_tuple)  # TypeError: 'list' object is not callable

Common names to avoid: list, str, dict, set, type, id, min, max, sum.

4. The Path Separator Trap

The Misconception

You hardcode file paths using slashes, like data/file.csv or data\\\\file.csv.

The Reality

Windows uses backslashes \\\\. Mac/Linux use forward slashes /. If you hardcode them, your code breaks on other operating systems.

Example

# BAD ❌
filename = "data\\\\\\\\exports\\\\\\\\users.csv"  # Breaks on Mac/Linux

# GOOD ✅
import os
filename = os.path.join("data", "exports", "users.csv")

# BETTER (Python 3) ✅
from pathlib import Path
filename = Path("data") / "exports" / "users.csv"

5. The Reference Trap

The Misconception

You think list_b = list_a creates a second, independent list.

The Reality

Both variables point to the same object in memory. If you change one, you change both.

🎬 Terminal Tutorial: Reference Trap

Example

# BAD ❌
original = [1, 2, 3]
copy = original  # This is NOT a copy!
copy.append(4)

print(original) # [1, 2, 3, 4] - The original was changed too!

# GOOD ✅
original = [1, 2, 3]
copy = original.copy()  # Or: list(original) or original[:]
copy.append(4)

print(original) # [1, 2, 3] - Safe!

6. The Import Side-Effect

The Misconception

You think import my_utils only gives you access to the functions inside it.

The Reality

Python executes the entire file when you import it. If you have "loose" print statements or file writes at the bottom of your utility script, they will run every time you import it!

🎬 Terminal Tutorial: Import Side-effects

Example

# my_utils.py
def clean(val):
    return val.strip()

# LOOSE CODE - This runs on every import!
print("Cleanup started...")

The Fix: Always wrap your "running" code in an if __name__ == "__main__": block.

7. Falsy Surprises (None vs 0)

The Misconception

Using if not value: is a safe way to check if data is missing or empty.

The Reality

In Python, numbers like 0 and 0.0 are considered "falsy". If you are cleaning numeric data (like prices or counts), if not value: will trigger for a price of zero, even though zero is a perfectly valid number!

🎬 Terminal Tutorial: Falsy Surprises

Example

# BAD ❌ (drops zeros)
def process_price(price):
    if not price:  # Triggers for 0.0!
        return "MISSING"
    return f"${price}"

print(process_price(0.0))  # "MISSING" - WRONG!

The Fix: Specifically check for None or the expected type.

# GOOD ✅
def process_price(price):
    if price is None:
        return "MISSING"
    return f"${price}"

8. The Silent Failure (Print vs Logging)

The Misconception

You use print("Error happened") to debug your code.

The Reality

print() outputs to stdout (Standard Output), which is often discarded in production environments (like Docker containers or Cloud Functions). If your script fails at 3 AM, your print statements are gone forever.

Example

# BAD ❌
try:
    process_data()
except Exception as e:
    print(f"Error: {e}")  # Lost in the void

# GOOD ✅
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

try:
    process_data()
except Exception as e:
    logger.error("Processing failed", exc_info=True)  # Saved with stack trace!

9. The Automation Killer (Input vs Argparse)

The Misconception

You use filename = input("Enter filename: ") to make your script interactive.

The Reality

Data Engineering scripts are almost never run by humans. They are run by Scheduler Robots (cron, Airflow, GitHub Actions). Robots cannot type on a keyboard. Your script will hang forever waiting for input.

Example

# BAD ❌ (Blocks automation)
filename = input("Which file? ")

# GOOD ✅ (Accepts arguments)
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--file", help="Path to the CSV file")
args = parser.parse_args()

filename = args.file


Next: Practice Exercises