Azure Setup and Account Access
Week 1 Assignment: The Data Cleaning Pipeline
Career relevance: Week 1 in the NL data job market
Going Further: Optional Deep Dives
No matter how experienced you become, you will write code that crashes. The core program already gave you a steady supply of crashing JS. In data engineering, the supply gets larger: messy real-world data routinely breaks pipelines that worked fine on a clean sample.
Learning to read errors and fix them (debugging) is a superpower. In this chapter, you'll learn how Python tells you something went wrong and how to investigate it.
Just like in JavaScript, errors in Python generally fall into three categories:
These happen when Python doesn't understand your code because you broke the rules of the language. Python catches these before it runs your program.
# ❌ SyntaxError: expected ':'
if True
print("This won't run")
Common Syntax Errors:
: at the end of if, for, def.( or brackets [.Since you have the VSCode Python Extension installed, your IDE will flag these when you write the code by using Pylance, VSCode's default Python language support tool.
These happen while the program is running. The syntax is correct, but something illegal happened during execution.
# ❌ ZeroDivisionError: division by zero
result = 10 / 0
# ❌ NameError: name 'x' is not defined
print(x)
The program runs without crashing, but it does the wrong thing. These are the hardest to catch because Python won't give you an error message.
# 🐛 Logical Error: Calculating average incorrectly
numbers = [10, 20, 30]
average = sum(numbers) / 2 # Should be divided by len(numbers), which is 3!
When Python crashes, it prints a "Stack Trace" (or Traceback). It looks intimidating, but it's actually very helpful. It tells you exactly where the problem is.
Unlike some JavaScript error messages which can be vague, Python's traceback is usually very precise.
Example Traceback:
Traceback (most recent call last):
File "main.py", line 5, in <module>
calculate_total(10, 0)
File "main.py", line 2, in calculate_total
return a / b
ZeroDivisionError: division by zero
How to read it:
ZeroDivisionError) and the message (division by zero).main.py), the line number (line 2), and the code that caused the crash.Copy the following code into a file named buggy.py and run it. Look at the traceback. Which line actually caused the crash? Which line called the function that crashed?
def greet(name):
return "Hello " + name
def welcome_users(users):
for user in users:
print(greet(user))
# There is a bug here!
user_list = ["Alice", "Bob", 123]
welcome_users(user_list)
The simplest way to debug is often the most effective. If your code isn't doing what you expect, print() the values of your variables at different steps.
def add_tax(amount):
print(f"DEBUG: amount is{amount}") # 👀 Check input
tax = amount * 0.21
print(f"DEBUG: tax calculated is{tax}") # 👀 Check intermediate value
return amount + tax
The following code tries to find the largest number in a list, but it returns the wrong answer. Use print() statements to trace the loop and fix the logical error.
numbers = [1, 5, 2, 9, 3]
max_num = 0
for n in numbers:
if n < max_num: # 🤔 Is this correct?
max_num = n
print(f"The largest number is{max_num}")
Using print() is fine for small scripts, but for larger applications, it gets messy. Imagine having to delete 50 print statements before committing your code!
Visual Studio Code has a built-in Debugger. It allows you to pause your code in the middle of execution and look at the variables live.
A breakpoint is a stop sign for your code. When Python reaches this line, it will pause.
Instead of clicking the "Play" button at the top right:
Your code will start running and freeze at your red dot.
Once paused, a floating toolbar appears at the top. Here are the most important buttons:
Look at the Variables panel on the left side. You can see the value of every variable at that exact moment. No more print(variable) needed!
You are writing a script to fetch data records from an API. You want to collect exactly 20 records to form a "batch" before saving them. The API gives you records in chunks of 3.
This code runs forever and crashes your terminal (an infinite loop). Do not fix it by guessing! Use the debugger to find out why it misses the target.
current_count += 3.current_count variable in the Variables panel on the left.current_count have when the loop should stop, but doesn't?target_batch_size = 20
current_count = 0
print("--- Starting Batch Collection ---")
# We need exactly 20 records to close the batch
while current_count != target_batch_size:
print(f"Status: We have{current_count} records...")
# Simulate fetching 3 records at a time
current_count += 3
print("Batch successfully collected!")
Once you see the variable skip past 20 in the debugger, you'll realize why != (not equal) is dangerous here. How would you change the while condition to make it safe?
<aside> 🤓 Curious Geek: The "first computer bug"
The word "bug" for software defect is older than computers: Thomas Edison used it in an 1878 letter complaining about telegraph wiring. The famous "first actual bug" is from September 9, 1947: a moth got stuck in relay #70, panel F of the Harvard Mark II electromechanical computer. Operators (including Grace Hopper) taped the moth into the engineering log under the entry "First actual case of bug being found." The log page survives in the Smithsonian's National Museum of American History. The story is sometimes told as the origin of the word; that's wrong, but the moth genuinely was the first physical bug to cause a software failure.
</aside>
The fastest way to internalise tracebacks and the debugger is to use them on a script that is already broken.
<aside>
📝 Practice: The week's Practice chapter has two exercises where the debugger pays for itself fast: Ex 2 (the Data Cleaner has a TypeError and a logic bug) and Ex 3 (the Precision Trap: stepping through 0.1 + 0.2 == 0.3 shows you why floats lie). Try at least one with breakpoints set instead of print().
</aside>
<aside> 🚀 Try it in the widget: Interactive Quiz: Errors and Debugging
</aside>
https://lasse.be/simple-hyf-teach-widget/mcq.html?bank=week_1_ch7_reading_errors_quiz&embed=1
SyntaxError and a RuntimeError?breakpoint (the red dot 🔴)?Next up: Logging in Python, where you replace ad-hoc print() debugging with the logging module: leveled messages, configurable output, and a habit you keep using once code reaches production.