Week 1 - Python Foundations

Python Setup

Data Types and Variables

Control Flow: Logic and Loops

Functions and Modules

Type Hints for Clearer Code

Command-Line Interface Habits

Errors and Debugging

Logging in Python

File Operations

Azure Setup and Account Access

Practice

Week 1 Gotchas & Pitfalls

Week 1 Assignment: The Data Cleaning Pipeline

Career relevance: Week 1 in the NL data job market

Week 1 Glossary

Going Further: Optional Deep Dives

Week 1 Kickoff Slides

Errors and Debugging

No matter how experienced you become, you will write code that crashes. The core program already gave you a steady supply of crashing JS. In data engineering, the supply gets larger: messy real-world data routinely breaks pipelines that worked fine on a clean sample.

Learning to read errors and fix them (debugging) is a superpower. In this chapter, you'll learn how Python tells you something went wrong and how to investigate it.


Types of Errors

Just like in JavaScript, errors in Python generally fall into three categories:

1. Syntax Errors

These happen when Python doesn't understand your code because you broke the rules of the language. Python catches these before it runs your program.

# ❌ SyntaxError: expected ':'
if True
    print("This won't run")

Common Syntax Errors:

Since you have the VSCode Python Extension installed, your IDE will flag these when you write the code by using Pylance, VSCode's default Python language support tool.

2. Runtime Errors (Exceptions)

These happen while the program is running. The syntax is correct, but something illegal happened during execution.

# ❌ ZeroDivisionError: division by zero
result = 10 / 0

# ❌ NameError: name 'x' is not defined
print(x)

3. Logical Errors

The program runs without crashing, but it does the wrong thing. These are the hardest to catch because Python won't give you an error message.

# 🐛 Logical Error: Calculating average incorrectly
numbers = [10, 20, 30]
average = sum(numbers) / 2  # Should be divided by len(numbers), which is 3!

Reading Stack Traces

When Python crashes, it prints a "Stack Trace" (or Traceback). It looks intimidating, but it's actually very helpful. It tells you exactly where the problem is.

Unlike some JavaScript error messages which can be vague, Python's traceback is usually very precise.

Example Traceback:

Traceback (most recent call last):
  File "main.py", line 5, in <module>
    calculate_total(10, 0)
  File "main.py", line 2, in calculate_total
    return a / b
ZeroDivisionError: division by zero

How to read it:

  1. Start at the bottom: The last line tells you the type of error (ZeroDivisionError) and the message (division by zero).
  2. Look just above it: It tells you the file (main.py), the line number (line 2), and the code that caused the crash.
  3. Go up: If the error happened inside a function, the lines above show you who called that function.

🖐 Hands-on: Be the Detective

Copy the following code into a file named buggy.py and run it. Look at the traceback. Which line actually caused the crash? Which line called the function that crashed?

def greet(name):
    return "Hello " + name

def welcome_users(users):
    for user in users:
        print(greet(user))

# There is a bug here!
user_list = ["Alice", "Bob", 123]
welcome_users(user_list)

Debugging Techniques

Option 1: The "Print" Debugging

The simplest way to debug is often the most effective. If your code isn't doing what you expect, print() the values of your variables at different steps.

def add_tax(amount):
    print(f"DEBUG: amount is{amount}") # 👀 Check input
    tax = amount * 0.21
    print(f"DEBUG: tax calculated is{tax}") # 👀 Check intermediate value
    return amount + tax

🖐 Hands-on: Fix the Logic

The following code tries to find the largest number in a list, but it returns the wrong answer. Use print() statements to trace the loop and fix the logical error.

numbers = [1, 5, 2, 9, 3]
max_num = 0

for n in numbers:
    if n < max_num:  # 🤔 Is this correct?
        max_num = n

print(f"The largest number is{max_num}")

Option 2: VS Code Debugger

Using print() is fine for small scripts, but for larger applications, it gets messy. Imagine having to delete 50 print statements before committing your code!

Visual Studio Code has a built-in Debugger. It allows you to pause your code in the middle of execution and look at the variables live.

1. Setting a Breakpoint

A breakpoint is a stop sign for your code. When Python reaches this line, it will pause.

2. Starting the Debugger

Instead of clicking the "Play" button at the top right:

  1. Click the Run and Debug icon on the left sidebar (it looks like a bug with a play button ▷🐛).
  2. Click the big blue Run and Debug button.
  3. Select Python Debugger -> Python File if asked.

Your code will start running and freeze at your red dot.

3. Controlling the Flow

Once paused, a floating toolbar appears at the top. Here are the most important buttons:

4. Inspecting Variables

Look at the Variables panel on the left side. You can see the value of every variable at that exact moment. No more print(variable) needed!

⌨️ Hands on: The Runaway Batch Job

You are writing a script to fetch data records from an API. You want to collect exactly 20 records to form a "batch" before saving them. The API gives you records in chunks of 3.

This code runs forever and crashes your terminal (an infinite loop). Do not fix it by guessing! Use the debugger to find out why it misses the target.

  1. Copy the code below into VS Code.
  2. Set a breakpoint 🔴 on the line current_count += 3.
  3. Start the Debugger (Run and Debug).
  4. Watch the current_count variable in the Variables panel on the left.
  5. Keep clicking Continue (▷). What value does current_count have when the loop should stop, but doesn't?
target_batch_size = 20
current_count = 0

print("--- Starting Batch Collection ---")

# We need exactly 20 records to close the batch
while current_count != target_batch_size:
    print(f"Status: We have{current_count} records...")

    # Simulate fetching 3 records at a time
    current_count += 3

print("Batch successfully collected!")

Once you see the variable skip past 20 in the debugger, you'll realize why != (not equal) is dangerous here. How would you change the while condition to make it safe?


<aside> 🤓 Curious Geek: The "first computer bug"

The word "bug" for software defect is older than computers: Thomas Edison used it in an 1878 letter complaining about telegraph wiring. The famous "first actual bug" is from September 9, 1947: a moth got stuck in relay #70, panel F of the Harvard Mark II electromechanical computer. Operators (including Grace Hopper) taped the moth into the engineering log under the entry "First actual case of bug being found." The log page survives in the Smithsonian's National Museum of American History. The story is sometimes told as the origin of the word; that's wrong, but the moth genuinely was the first physical bug to cause a software failure.

</aside>

The fastest way to internalise tracebacks and the debugger is to use them on a script that is already broken.

<aside> 📝 Practice: The week's Practice chapter has two exercises where the debugger pays for itself fast: Ex 2 (the Data Cleaner has a TypeError and a logic bug) and Ex 3 (the Precision Trap: stepping through 0.1 + 0.2 == 0.3 shows you why floats lie). Try at least one with breakpoints set instead of print().

</aside>

🧠 Knowledge Check


<aside> 🚀 Try it in the widget: Interactive Quiz: Errors and Debugging

</aside>

https://lasse.be/simple-hyf-teach-widget/mcq.html?bank=week_1_ch7_reading_errors_quiz&embed=1


Extra reading


Next up: Logging in Python, where you replace ad-hoc print() debugging with the logging module: leveled messages, configurable output, and a habit you keep using once code reaches production.