Week 1 -Foundational Python

Python Setup

Data Types and Variables

Control Flow

Functions and Modules

Type Hinting

CLI Habits

Errors and Debugging

Logging in Python

File Operations

[Cloud] Azure Setup

Gotchas & Pitfalls

Practice

Assignment

Back to Track

Command-Line Interface Habits

Data pipelines are typically run from the command line, not by clicking buttons in an IDE. In this section, you'll learn professional habits for running Python scripts and handling command-line arguments.

Running Python Scripts

Basic execution

# Run a Python script
python script.py

# Explicitly use Python 3.11
python3.11 script.py

# Run a module (useful for packages)
python -m mypackage.script

Running with arguments

# Pass arguments to your script
python process.py input.csv output.json

# With named flags
python process.py --input data.csv --output result.json --verbose

💡 When your virtual environment is activated, python will point to the correct version. Always activate your venv before running scripts!

The argparse Module

The argparse module is Python's built-in way to handle command-line arguments professionally.

💡 Why use argparse?

Basic argparse example

# process.py
import argparse

def main():
    parser = argparse.ArgumentParser(
        description="Process a data file."
    )
    parser.add_argument("input", help="Input file path")
    parser.add_argument("output", help="Output file path")

    args = parser.parse_args()

    print(f"Input: {args.input}")
    print(f"Output: {args.output}")

if __name__ == "__main__":
    main()

Run it:

python process.py data.csv result.json
# Output:
# Input: data.csv
# Output: result.json

Automatic help

argparse generates help text automatically:

python process.py --help
# Output:
# usage: process.py [-h] input output
#
# Process a data file.
#
# positional arguments:
#   input       Input file path
#   output      Output file path
#
# optional arguments:
#   -h, --help  show this help message and exit

Positional vs Optional Arguments

Positional arguments (required)

parser.add_argument("input_file", help="The file to process")
parser.add_argument("output_file", help="Where to save results")

Usage: python script.py input.csv output.json

Optional arguments (flags)

parser.add_argument(
    "--verbose", "-v",
    action="store_true",
    help="Enable verbose output"
)
parser.add_argument(
    "--limit", "-l",
    type=int,
    default=100,
    help="Maximum number of records (default: 100)"
)

Usage: python script.py input.csv output.json --verbose --limit 50

⚠️ Optional arguments start with -- (or - for short form). Positional arguments don't have dashes.

Common argparse Patterns

A complete data pipeline script

#!/usr/bin/env python3
"""
Data processing pipeline script.

Usage:
    python pipeline.py input.csv output.json
    python pipeline.py input.csv output.json --verbose --skip-header
"""
import argparse
import sys

def create_parser() -> argparse.ArgumentParser:
    """Create and configure the argument parser."""
    parser = argparse.ArgumentParser(
        description="Process CSV data and output JSON.",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  python pipeline.py data.csv result.json
  python pipeline.py data.csv result.json --verbose
  python pipeline.py data.csv result.json --limit 100
        """
    )

    # Positional arguments
    parser.add_argument(
        "input",
        help="Path to input CSV file"
    )
    parser.add_argument(
        "output",
        help="Path to output JSON file"
    )

    # Optional arguments
    parser.add_argument(
        "--verbose", "-v",
        action="store_true",
        help="Enable verbose output"
    )
    parser.add_argument(
        "--limit", "-l",
        type=int,
        default=None,
        help="Limit number of records to process"
    )
    parser.add_argument(
        "--skip-header",
        action="store_true",
        help="Skip the first row of the CSV"
    )

    return parser

def process_data(input_path: str, output_path: str,
                 verbose: bool = False, limit: int | None = None,
                 skip_header: bool = False) -> None:
    """Process data from input to output."""
    if verbose:
        print(f"Reading from: {input_path}")
        print(f"Writing to: {output_path}")
        if limit:
            print(f"Limiting to {limit} records")

    # Actual processing would go here
    print("Processing complete!")

def main() -> int:
    """Main entry point."""
    parser = create_parser()
    args = parser.parse_args()

    try:
        process_data(
            input_path=args.input,
            output_path=args.output,
            verbose=args.verbose,
            limit=args.limit,
            skip_header=args.skip_header
        )
        return 0  # Success
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        return 1  # Failure

if __name__ == "__main__":
    sys.exit(main())

💡 The pattern of main() returning an exit code and sys.exit(main()) is professional practice. Exit code 0 means success, non-zero means failure.

Argument Types and Validation

Specifying types

# Integer argument
parser.add_argument("--count", type=int, default=10)

# Float argument
parser.add_argument("--threshold", type=float, default=0.5)

# File path (validates file exists)
parser.add_argument("--config", type=argparse.FileType("r"))

Choices (restricted values)

parser.add_argument(
    "--format",
    choices=["json", "csv", "xml"],
    default="json",
    help="Output format (default: json)"
)

parser.add_argument(
    "--log-level",
    choices=["DEBUG", "INFO", "WARNING", "ERROR"],
    default="INFO",
    help="Logging level"
)

Required optional arguments

parser.add_argument(
    "--api-key",
    required=True,
    help="API key (required)"
)

Exit Codes

Exit codes tell the calling process whether your script succeeded or failed.

import sys

def main():
    # ... do work ...

    if error_occurred:
        print("Error: something went wrong", file=sys.stderr)
        sys.exit(1)  # Non-zero = failure

    print("Success!")
    sys.exit(0)  # Zero = success

Common exit codes:

⌨️ Hands-on: Create a script greet.py that accepts a --name argument (default: "World") and prints "Hello, {name}!". Add a --loud flag that makes it print in uppercase.

Environment Variables

Sometimes you want to configure scripts via environment variables instead of command-line arguments.

💡 Why use environment variables?

import os
import argparse

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--api-key",
        default=os.environ.get("API_KEY"),
        help="API key (or set API_KEY environment variable)"
    )
    args = parser.parse_args()

    if not args.api_key:
        print("Error: API key required", file=sys.stderr)
        sys.exit(1)

    print(f"Using API key: {args.api_key[:4]}...")

if __name__ == "__main__":
    main()

Usage:

# Via argument
python script.py --api-key secret123

# Via environment variable
export API_KEY=secret123
python script.py

# Inline (Linux/macOS)
API_KEY=secret123 python script.py

⚠️ Never print full API keys or passwords! The example above only prints the first 4 characters.

Best Practices

  1. Always add descriptions to your parser and arguments
  2. Use type hints in your functions that receive parsed arguments
  3. Return exit codes to indicate success/failure
  4. Print errors to stderr using print(..., file=sys.stderr)
  5. Support -help (argparse does this automatically)
  6. Use if __name__ == "__main__" to allow importing without running

🧠 Knowledge Check

  1. Why is using input() (asking for user typing) bad for automated data pipelines?
  2. What does an exit code of 0 usually signal to the operating system?
  3. How does using argparse help other people use your script?

Extra reading


Next lesson: Reading Errors and Debugging