Teachers

📝 Logging in Python

In the previous chapter, we used print() to debug our code. While print() is great for quick scripts, it is bad practice for real data engineering applications.

Why?

No context: print just outputs text. It doesn’t tell you when it happened or how serious it is.
Hard to control: You can’t easily turn off print statements when your application goes to production.
Performance: Printing too much can slow down your code.

In professional Python code, we use the logging module.

The Logging Module

Python has a built-in module called logging. It works similarly to console.log, console.error, etc., in JavaScript, but it’s much more powerful.

To use it, you first need to import it:

import logging

Logging Levels

Logging uses “levels” to indicate how important a message is.

Level	Function	When to use
DEBUG	`logging.debug()`	Detailed info, only useful for diagnosing problems.
INFO	`logging.info()`	Confirmation that things are working as expected.
WARNING	`logging.warning()`	Something unexpected happened, but the software is still working.
ERROR	`logging.error()`	A more serious problem; the software couldn’t perform a function.
CRITICAL	`logging.critical()`	A serious error, indicating the program execution may stop.

Basic Configuration

By default, Python only shows messages that are WARNING or higher. To see DEBUG or INFO messages, we need to configure the logger.

import logging

# Configure the logging system
logging.basicConfig(level=logging.INFO)

logging.debug("This will NOT show")
logging.info("This will show")
logging.warning("This will show")

🖐 Hands-on: First Logs

Create a file called app.py. Copy the code below and run it. Then, change the level in basicConfig to logging.DEBUG and run it again. What changes?

import logging

logging.basicConfig(level=logging.WARNING)

print("--- Starting Program ---")
logging.debug("Connecting to database...")
logging.info("Connection successful.")
logging.warning("Disk space is running low.")
logging.error("Database connection failed!")

Formatting Logs

One of the best features of logging is formatting. You can automatically add timestamps to every message. This is crucial for Data Engineers to know when a pipeline failed.

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s -%(levelname)s -%(message)s'
)

logging.info("Data processing started")

Output:

2023-10-25 14:30:01,123 - INFO - Data processing started

🚀 Final Exercise: Refactoring Print to Log

You have received a messy script from a colleague that uses print everywhere. Your task is to upgrade it to use logging.

Import logging.
Configure it to show INFO level messages and include timestamps.
Replace the print statements with the appropriate logging level (info, warning, or error).

Colleague’s Script:

def process_data(data):
    print("Starting data processing...")

    if not data:
        print("ERROR: No data found!")
        return

    print(f"Processing{len(data)} records...")

    for record in data:
        if record < 0:
            print(f"Warning: Negative value found:{record}")

    print("Processing complete.")

my_data = [10, 20, -5, 30]
process_data(my_data)

🧠 Knowledge Check

Why should you use logging instead of print in a professional application?
Which logging level would you use for a message that says “User logged in successfully”?

📚 Extra Reading

Logging is a massive topic. As a Data Engineer, you will eventually need to log to files, external servers, or cloud services. These resources will help you take the next step:

Real Python: Logging in Python - Probably the best long-form guide on moving from basic to advanced logging configurations.
Corey Schafer (Video): Python Tutorial: Logging Basics - Logging to Files, Setting Levels, and Formatting - A must-watch video that shows how to send your logs to a .log file instead of just the console.
Official Python Docs: Logging HOWTO - The “official” way to learn about the more complex features of the logging module.
PyVideo: Exceptional Logging - Search for “Logging” to find various conference talks about how professional engineers structure their logs for large data pipelines.

CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.

Week 1 -Foundational Python