Week 1 - Python Foundations

Python Setup

Data Types and Variables

Control Flow: Logic and Loops

Functions and Modules

Type Hints for Clearer Code

Command-Line Interface Habits

Errors and Debugging

Logging in Python

File Operations

Azure Setup and Account Access

Practice

Week 1 Gotchas & Pitfalls

Week 1 Assignment: The Data Cleaning Pipeline

Career relevance: Week 1 in the NL data job market

Week 1 Glossary

Going Further: Optional Deep Dives

Week 1 Kickoff Slides

Logging in Python

In the previous chapter, we used print() to debug our code. While print() is great for quick scripts, it is bad practice for real data engineering applications.

Why?

  1. No context: print just outputs text. It doesn't tell you when it happened or how serious it is.
  2. Hard to control: You can't easily turn off print statements when your application goes to production.
  3. Performance: Printing too much can slow down your code.

In professional Python code, we use the logging module.

The Logging Module

Python has a built-in module called logging. It works similarly to console.log, console.error, etc., in JavaScript, but it's much more powerful.

To use it, you first need to import it:

import logging

Logging Levels

Logging uses "levels" to indicate how important a message is.

Level Function When to use
DEBUG logging.debug() Detailed info, only useful for diagnosing problems.
INFO logging.info() Confirmation that things are working as expected.
WARNING logging.warning() Something unexpected happened, but the software is still working.
ERROR logging.error() A more serious problem; the software couldn't perform a function.
CRITICAL logging.critical() A serious error, indicating the program execution may stop.

Basic Configuration

By default, Python only shows messages that are WARNING or higher. To see DEBUG or INFO messages, we need to configure the logger.

import logging

# Configure the logging system
logging.basicConfig(level=logging.INFO)

logging.debug("This will NOT show")
logging.info("This will show")
logging.warning("This will show")

๐Ÿ– Hands-on: First Logs

Create a file called app.py. Copy the code below and run it. Then, change the level in basicConfig to logging.DEBUG and run it again. What changes?

import logging

logging.basicConfig(level=logging.WARNING)

print("--- Starting Program ---")
logging.debug("Connecting to database...")
logging.info("Connection successful.")
logging.warning("Disk space is running low.")
logging.error("Database connection failed!")

Formatting Logs

One of the best features of logging is formatting. You can automatically add timestamps to every message. This is crucial for Data Engineers to know when a pipeline failed.

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s -%(levelname)s -%(message)s'
)

logging.info("Data processing started")

Output:

2023-10-25 14:30:01,123 - INFO - Data processing started


<aside> โŒจ๏ธ Hands on: Refactor a script that uses print for everything into structured logging calls (DEBUG/INFO/WARNING/ERROR), so the level can be tuned per environment instead of editing every line.

</aside>


<aside> ๐Ÿš€ Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=1&chapter=logging_basics&exercise=w1_logging__refactor_print&lang=python

</aside>

๐Ÿš€ Final Exercise: Refactoring Print to Log

You have received a messy script from a colleague that uses print everywhere. Your task is to upgrade it to use logging.

  1. Import logging.
  2. Configure it to show INFO level messages and include timestamps.
  3. Replace the print statements with the appropriate logging level (info, warning, or error).

Colleague's Script:

def process_data(data):
    print("Starting data processing...")

    if not data:
        print("ERROR: No data found!")
        return

    print(f"Processing{len(data)} records...")

    for record in data:
        if record < 0:
            print(f"Warning: Negative value found:{record}")

    print("Processing complete.")

my_data = [10, 20, -5, 30]
process_data(my_data)

<aside> ๐Ÿค“ Curious Geek: Why five log levels (and not four, or seven)?

Python's logging module ships with five severity levels (DEBUG, INFO, WARNING, ERROR, CRITICAL) because it inherits the design from Unix's syslog, written by Eric Allman at Berkeley in the early 1980s. Syslog defined eight levels (emerg, alert, crit, err, warning, notice, info, debug); Python collapsed notice into info and the three top-most "the system is on fire" levels into critical, settling on five as the smallest set most projects need. PEP 282 (2002) by Vinay Sajip codified the API. The same five names show up in Java's java.util.logging, Node's winston, and most cloud-logging backends: a small piece of 1980s Unix culture that everyone still uses without knowing where it came from.

</aside>

Picking the right level becomes a habit only once you write logging into a script you actually run end-to-end.

<aside> ๐Ÿ“ Practice: The week's Practice chapter has three exercises where you wire up logging: Ex 1 (Temperature Logger: log on every conversion), Ex 4 (Grade Processor: log per-student decisions at INFO + a summary at the end), and Ex 6 (Pipeline CLI: a --verbose flag that flips DEBUG on). Pick whichever level of effort fits your time budget.

</aside>

๐Ÿง  Knowledge Check


<aside> ๐Ÿš€ Try it in the widget: Interactive Quiz: Logging Basics

</aside>

https://lasse.be/simple-hyf-teach-widget/mcq.html?bank=week_1_ch8_logging_basics_quiz&embed=1


Extra reading


Next up: File Operations, where you read and write CSV / JSON / text files using pathlib and the with statement: the basic skill every data pipeline needs to load and persist data.


The HackYourFuture curriculum is licensed underย CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with โค๏ธ by the HackYourFuture community ยท Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.