Azure Setup and Account Access
Week 1 Assignment: The Data Cleaning Pipeline
Career relevance: Week 1 in the NL data job market
Going Further: Optional Deep Dives
Data engineering is all about processing streams of data. To do that, you need to make decisions (conditionals) and repeat actions (loops). This lesson covers the logic you'll use in almost every script.
if/elif/else)Conditionals let your code make decisions based on data.
score = 85
if score >= 90:
print("Grade: A")
elif score >= 80:
print("Grade: B")
else:
print("Grade: C or lower")
Python allows you to check if lists, strings, or numbers are "empty" or "zero" directly:
# Check if a list has items
users = []
if not users:
print("No users found!")
# Check if a string is missing
name = None
if not name:
print("Name is missing")
# Check if a number is non-zero
count = 0
if count:
print(f"Count is {count}")
else:
print("Count is zero")
for LoopsUse for loops when you want to iterate over a collection (like a list of files or rows in a CSV). One pass through the loop body is one iteration; the for loop knows how many iterations to run because the collection has a known length. The collection itself is called an iterable: lists, tuples, strings, dicts, files, and range() objects all qualify. To get a running counter alongside each value, wrap the iterable in enumerate().
# Loop over a list
files = ["data1.csv", "data2.csv", "data3.csv"]
for filename in files:
print(f"Processing {filename}...")
# Loop with an index using enumerate()
for i, filename in enumerate(files):
print(f"File {i+1}: {filename}")
# Loop over a dictionary
user = {"name": "Alice", "role": "Engineer"}
for key, value in user.items():
print(f"{key}: {value}")
while LoopsUse while loops when you don't know how many times to repeat, but you have a condition to stop (e.g., waiting for a file to appear, or retrying a network request).
import time
retries = 3
while retries > 0:
print(f"Connecting to database... ({retries} retries left)")
# simulate connection attempt
success = False
if success:
print("Connected!")
break # Exit the loop immediately
retries -= 1
time.sleep(1) # Wait 1 second before retrying
if retries == 0:
print("Failed to connect.")
break and continuebreak: Exits the loop entirely.continue: Skips the rest of the current iteration and jumps to the next one.records = [10, 20, -1, 30, -5, 40]
valid_records = []
for record in records:
if record < 0:
print(f"Skipping invalid record: {record}")
continue # Skip negative numbers
if record > 50:
print("Limit reached, stopping.")
break # Stop processing if value is too high
valid_records.append(record)
List comprehensions are a concise, "Pythonic" way to create lists. They are extremely popular in data engineering for simple transformations.
# [expression for item in iterable if condition]
1. Transform a list (Map)
# Old way
numbers = [1, 2, 3, 4]
squared = []
for num in numbers:
squared.append(num * 2)
# List comprehension way
squared = [num * 2 for num in numbers]
# Result: [2, 4, 6, 8]
2. Filter a list
# Get only even numbers
evens = [num for num in numbers if num % 2 == 0]
3. Clean data strings
raw_names = [" Alice ", "Bob", " Charlie "]
clean_names = [name.strip().lower() for name in raw_names]
# Result: ['alice', 'bob', 'charlie']
<aside>
โ ๏ธ Pro Tip: If your list comprehension is getting too complex (e.g., nested loops or multiple distinct conditions), switch back to a regular for loop for readability.
</aside>
Sometimes you need to loop loop inside a loop.
departments = {
"Engineering": ["Alice", "Bob"],
"Sales": ["Charlie", "David"]
}
for dept, employees in departments.items():
print(f"--- {dept} ---")
for employee in employees:
print(f" - {employee}")
Output:
--- Engineering ---
- Alice
- Bob
--- Sales ---
- Charlie
- David
This pattern is useful when processing grouped data (e.g., records by department, transactions by customer).
<aside> ๐ค Curious Geek: Why list comprehensions exist
List comprehensions arrived in Python 2.0 in October 2000 via PEP 202, heavily inspired by the same syntax in Haskell and (further back) the set-builder notation from mathematics: {xยฒ | x โ โ, x < 10}. Guido van Rossum championed them because the explicit for-loop-and-append pattern was the most-typed three lines in every Python program. The same impulse later gave Python generator expressions (PEP 289), dict comprehensions, and set comprehensions: each one is "a for loop where the only goal was to build a new collection."
</aside>
Test your understanding of loops and logic with this interactive exercise:
<aside> ๐ Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?exercise=control_flow
</aside>
Challenge: The code in the widget loops through a list but has logic errors. Use break and continue to filter the data correctly as per the instructions in the widget.
<aside> ๐ Practice: The week's Practice chapter has two exercises that build on this chapter: Ex 2 (the Data Cleaner: loops + conditionals over a dirty list) and Ex 4 (Grade Processor: branching logic on a dictionary). Both take a few minutes and run in your venv.
</aside>
<aside> ๐ Try it in the widget: Interactive Quiz: Control Flow
</aside>
https://lasse.be/simple-hyf-teach-widget/mcq.html?bank=week_1_ch3_control_flow_quiz&embed=1
[] in Python, and how would you idiomatically check if a list has at least one item?while loop instead of a for loop? Give one data-engineering example.squared = [] then for x in range(5): squared.append(x**2).break and continue?for loop over a list comprehension, even if the list comprehension fits on one line?
if x > 10:
pass
elif x == 5:
pass
else:
pass
## For loop
for item in items:
# do something
## With index
for index, item in enumerate(items):
# do something
## List Comprehension
new_list = [transform(x) for x in old_list if condition(x)]
## Dictionary Iteration
for key, value in my_dict.items():
# do something
break: stop loopcontinue: skip to next iterationpass: do nothing (placeholder)Next up: Functions and Modules, where you package the loops and conditionals you just learned into reusable building blocks for your pipelines.
The HackYourFuture curriculum is licensed underย CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with โค๏ธ by the HackYourFuture community ยท Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.