Welcome to the "Gotchas" section. These are the subtle traps that every Python developer falls into at least once. We're telling you now so you can spot them when (not if) they happen to you.
This is the most famous Python gotcha.
You think that list=[] in a function argument creates a new empty list every time you call the function.
Python creates the default object once when the function is defined, not when it is called. If you modify that object, the change persists for the next call!
🎬 Terminal Tutorial: Mutable Defaults
# BAD ❌
def add_student(name, students=[]):
students.append(name)
return students
print(add_student("Alice")) # ['Alice'] - Good
print(add_student("Bob")) # ['Alice', 'Bob'] - WAIT WHAT? (Alice is still there!)
# GOOD ✅
def add_student(name, students=None):
if students is None:
students = [] # Create a new list inside the function
students.append(name)
return students
You think 0.1 + 0.2 equals 0.3.
Computers store decimals in binary (base 2), which cannot perfectly represent some fractions like 0.1.
🎬 Terminal Tutorial: Float Precision
print(0.1 + 0.2 == 0.3) # False!
print(0.1 + 0.2) # 0.30000000000000004
⚠️ Data Engineering Rule: Never use float types for money. Use integers (cents) or the decimal module.
You need a variable to store a list of users, so you call it list.
You just overwrote Python's built-in list() function. Now you can't create new lists or convert things to lists anymore.
🎬 Terminal Tutorial: Shadowing Built-ins
# BAD ❌
list = [1, 2, 3] # You just killed the list() function
my_tuple = (4, 5)
numbers = list(my_tuple) # TypeError: 'list' object is not callable
Common names to avoid: list, str, dict, set, type, id, min, max, sum.
You hardcode file paths using slashes, like data/file.csv or data\\\\file.csv.
Windows uses backslashes \\\\. Mac/Linux use forward slashes /. If you hardcode them, your code breaks on other operating systems.
# BAD ❌
filename = "data\\\\\\\\exports\\\\\\\\users.csv" # Breaks on Mac/Linux
# GOOD ✅
import os
filename = os.path.join("data", "exports", "users.csv")
# BETTER (Python 3) ✅
from pathlib import Path
filename = Path("data") / "exports" / "users.csv"
You think list_b = list_a creates a second, independent list.
Both variables point to the same object in memory. If you change one, you change both.
🎬 Terminal Tutorial: Reference Trap
# BAD ❌
original = [1, 2, 3]
copy = original # This is NOT a copy!
copy.append(4)
print(original) # [1, 2, 3, 4] - The original was changed too!
# GOOD ✅
original = [1, 2, 3]
copy = original.copy() # Or: list(original) or original[:]
copy.append(4)
print(original) # [1, 2, 3] - Safe!
You think import my_utils only gives you access to the functions inside it.
Python executes the entire file when you import it. If you have "loose" print statements or file writes at the bottom of your utility script, they will run every time you import it!
🎬 Terminal Tutorial: Import Side-effects
# my_utils.py
def clean(val):
return val.strip()
# LOOSE CODE - This runs on every import!
print("Cleanup started...")
The Fix: Always wrap your "running" code in an if __name__ == "__main__": block.
Using if not value: is a safe way to check if data is missing or empty.
In Python, numbers like 0 and 0.0 are considered "falsy". If you are cleaning numeric data (like prices or counts), if not value: will trigger for a price of zero, even though zero is a perfectly valid number!
🎬 Terminal Tutorial: Falsy Surprises
# BAD ❌ (drops zeros)
def process_price(price):
if not price: # Triggers for 0.0!
return "MISSING"
return f"${price}"
print(process_price(0.0)) # "MISSING" - WRONG!
The Fix: Specifically check for None or the expected type.
# GOOD ✅
def process_price(price):
if price is None:
return "MISSING"
return f"${price}"
You use print("Error happened") to debug your code.
print() outputs to stdout (Standard Output), which is often discarded in production environments (like Docker containers or Cloud Functions). If your script fails at 3 AM, your print statements are gone forever.
# BAD ❌
try:
process_data()
except Exception as e:
print(f"Error: {e}") # Lost in the void
# GOOD ✅
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
try:
process_data()
except Exception as e:
logger.error("Processing failed", exc_info=True) # Saved with stack trace!
You use filename = input("Enter filename: ") to make your script interactive.
Data Engineering scripts are almost never run by humans. They are run by Scheduler Robots (cron, Airflow, GitHub Actions). Robots cannot type on a keyboard. Your script will hang forever waiting for input.
# BAD ❌ (Blocks automation)
filename = input("Which file? ")
# GOOD ✅ (Accepts arguments)
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--file", help="Path to the CSV file")
args = parser.parse_args()
filename = args.file
Next: Practice Exercises