Teachers

Command-Line Interface Habits

Data pipelines are typically run from the command line, not by clicking buttons in an IDE. In this section, you'll learn professional habits for running Python scripts and handling command-line arguments.

Running Python Scripts

Basic execution

# Run a Python script
python script.py

# Explicitly use Python 3.11
python3.11 script.py

# Run a module (useful for packages)
python -m mypackage.script

Running with arguments

# Pass arguments to your script
python process.py input.csv output.json

# With named flags
python process.py --input data.csv --output result.json --verbose

💡 When your virtual environment is activated, python will point to the correct version. Always activate your venv before running scripts!

The `argparse` Module

The argparse module is Python's built-in way to handle command-line arguments professionally.

💡 Why use argparse?

Input Validation: It automatically checks that users provide the right types of data (like numbers vs strings).

Documentation: It generates a complete -help menu automatically based on your code.

Professionalism: It handles flags (like -verbose) and short-forms (like v) exactly like standard Linux/Mac tools.

Basic argparse example

# process.py
import argparse

def main():
    parser = argparse.ArgumentParser(
        description="Process a data file."
    )
    parser.add_argument("input", help="Input file path")
    parser.add_argument("output", help="Output file path")

    args = parser.parse_args()

    print(f"Input: {args.input}")
    print(f"Output: {args.output}")

if __name__ == "__main__":
    main()

Run it:

python process.py data.csv result.json
# Output:
# Input: data.csv
# Output: result.json

Automatic help

argparse generates help text automatically:

python process.py --help
# Output:
# usage: process.py [-h] input output
#
# Process a data file.
#
# positional arguments:
#   input       Input file path
#   output      Output file path
#
# optional arguments:
#   -h, --help  show this help message and exit

Positional vs Optional Arguments

Positional arguments (required)

parser.add_argument("input_file", help="The file to process")
parser.add_argument("output_file", help="Where to save results")

Usage: python script.py input.csv output.json

Optional arguments (flags)

parser.add_argument(
    "--verbose", "-v",
    action="store_true",
    help="Enable verbose output"
)
parser.add_argument(
    "--limit", "-l",
    type=int,
    default=100,
    help="Maximum number of records (default: 100)"
)

Usage: python script.py input.csv output.json --verbose --limit 50

⚠️ Optional arguments start with -- (or - for short form). Positional arguments don't have dashes.

Common argparse Patterns

A complete data pipeline script

#!/usr/bin/env python3
"""
Data processing pipeline script.

Usage:
    python pipeline.py input.csv output.json
    python pipeline.py input.csv output.json --verbose --skip-header
"""
import argparse
import sys

def create_parser() -> argparse.ArgumentParser:
    """Create and configure the argument parser."""
    parser = argparse.ArgumentParser(
        description="Process CSV data and output JSON.",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  python pipeline.py data.csv result.json
  python pipeline.py data.csv result.json --verbose
  python pipeline.py data.csv result.json --limit 100
        """
    )

    # Positional arguments
    parser.add_argument(
        "input",
        help="Path to input CSV file"
    )
    parser.add_argument(
        "output",
        help="Path to output JSON file"
    )

    # Optional arguments
    parser.add_argument(
        "--verbose", "-v",
        action="store_true",
        help="Enable verbose output"
    )
    parser.add_argument(
        "--limit", "-l",
        type=int,
        default=None,
        help="Limit number of records to process"
    )
    parser.add_argument(
        "--skip-header",
        action="store_true",
        help="Skip the first row of the CSV"
    )

    return parser

def process_data(input_path: str, output_path: str,
                 verbose: bool = False, limit: int | None = None,
                 skip_header: bool = False) -> None:
    """Process data from input to output."""
    if verbose:
        print(f"Reading from: {input_path}")
        print(f"Writing to: {output_path}")
        if limit:
            print(f"Limiting to {limit} records")

    # Actual processing would go here
    print("Processing complete!")

def main() -> int:
    """Main entry point."""
    parser = create_parser()
    args = parser.parse_args()

    try:
        process_data(
            input_path=args.input,
            output_path=args.output,
            verbose=args.verbose,
            limit=args.limit,
            skip_header=args.skip_header
        )
        return 0  # Success
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        return 1  # Failure

if __name__ == "__main__":
    sys.exit(main())

💡 The pattern of main() returning an exit code and sys.exit(main()) is professional practice. Exit code 0 means success, non-zero means failure.

Argument Types and Validation

Specifying types

# Integer argument
parser.add_argument("--count", type=int, default=10)

# Float argument
parser.add_argument("--threshold", type=float, default=0.5)

# File path (validates file exists)
parser.add_argument("--config", type=argparse.FileType("r"))

Choices (restricted values)

parser.add_argument(
    "--format",
    choices=["json", "csv", "xml"],
    default="json",
    help="Output format (default: json)"
)

parser.add_argument(
    "--log-level",
    choices=["DEBUG", "INFO", "WARNING", "ERROR"],
    default="INFO",
    help="Logging level"
)

Required optional arguments

parser.add_argument(
    "--api-key",
    required=True,
    help="API key (required)"
)

Exit Codes

Exit codes tell the calling process whether your script succeeded or failed.

import sys

def main():
    # ... do work ...

    if error_occurred:
        print("Error: something went wrong", file=sys.stderr)
        sys.exit(1)  # Non-zero = failure

    print("Success!")
    sys.exit(0)  # Zero = success

Common exit codes:

0: Success
1: General error
2: Command-line usage error

⌨️ Hands-on: Create a script greet.py that accepts a --name argument (default: "World") and prints "Hello, {name}!". Add a --loud flag that makes it print in uppercase.

Environment Variables

Sometimes you want to configure scripts via environment variables instead of command-line arguments.

💡 Why use environment variables?

Security: Never hardcode API keys or passwords in your code. Using environment variables keeps secrets out of your git history.

CI/CD: Cloud platforms and automation tools use environment variables to configure pipelines without changing the code.

Defaults: They are perfect for setting "safe" defaults that can be overridden by explicit CLI arguments.

import os
import argparse

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--api-key",
        default=os.environ.get("API_KEY"),
        help="API key (or set API_KEY environment variable)"
    )
    args = parser.parse_args()

    if not args.api_key:
        print("Error: API key required", file=sys.stderr)
        sys.exit(1)

    print(f"Using API key: {args.api_key[:4]}...")

if __name__ == "__main__":
    main()

Usage:

# Via argument
python script.py --api-key secret123

# Via environment variable
export API_KEY=secret123
python script.py

# Inline (Linux/macOS)
API_KEY=secret123 python script.py

⚠️ Never print full API keys or passwords! The example above only prints the first 4 characters.

Best Practices

Always add descriptions to your parser and arguments
Use type hints in your functions that receive parsed arguments
Return exit codes to indicate success/failure
Print errors to stderr using print(..., file=sys.stderr)
Support -help (argparse does this automatically)
Use if __name__ == "__main__" to allow importing without running

🧠 Knowledge Check

Why is using input() (asking for user typing) bad for automated data pipelines?
What does an exit code of 0 usually signal to the operating system?
How does using argparse help other people use your script?

Extra reading

Next lesson: Reading Errors and Debugging

Week 1 -Foundational Python