Data pipelines are typically run from the command line, not by clicking buttons in an IDE. In this section, you'll learn professional habits for running Python scripts and handling command-line arguments.
# Run a Python script
python script.py
# Explicitly use Python 3.11
python3.11 script.py
# Run a module (useful for packages)
python -m mypackage.script
# Pass arguments to your script
python process.py input.csv output.json
# With named flags
python process.py --input data.csv --output result.json --verbose
💡 When your virtual environment is activated, python will point to the correct version. Always activate your venv before running scripts!
argparse ModuleThe argparse module is Python's built-in way to handle command-line arguments professionally.
💡 Why use argparse?
- Input Validation: It automatically checks that users provide the right types of data (like numbers vs strings).
- Documentation: It generates a complete
-helpmenu automatically based on your code.- Professionalism: It handles flags (like
-verbose) and short-forms (likev) exactly like standard Linux/Mac tools.
# process.py
import argparse
def main():
parser = argparse.ArgumentParser(
description="Process a data file."
)
parser.add_argument("input", help="Input file path")
parser.add_argument("output", help="Output file path")
args = parser.parse_args()
print(f"Input: {args.input}")
print(f"Output: {args.output}")
if __name__ == "__main__":
main()
Run it:
python process.py data.csv result.json
# Output:
# Input: data.csv
# Output: result.json
argparse generates help text automatically:
python process.py --help
# Output:
# usage: process.py [-h] input output
#
# Process a data file.
#
# positional arguments:
# input Input file path
# output Output file path
#
# optional arguments:
# -h, --help show this help message and exit
parser.add_argument("input_file", help="The file to process")
parser.add_argument("output_file", help="Where to save results")
Usage: python script.py input.csv output.json
parser.add_argument(
"--verbose", "-v",
action="store_true",
help="Enable verbose output"
)
parser.add_argument(
"--limit", "-l",
type=int,
default=100,
help="Maximum number of records (default: 100)"
)
Usage: python script.py input.csv output.json --verbose --limit 50
⚠️ Optional arguments start with -- (or - for short form). Positional arguments don't have dashes.
#!/usr/bin/env python3
"""
Data processing pipeline script.
Usage:
python pipeline.py input.csv output.json
python pipeline.py input.csv output.json --verbose --skip-header
"""
import argparse
import sys
def create_parser() -> argparse.ArgumentParser:
"""Create and configure the argument parser."""
parser = argparse.ArgumentParser(
description="Process CSV data and output JSON.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python pipeline.py data.csv result.json
python pipeline.py data.csv result.json --verbose
python pipeline.py data.csv result.json --limit 100
"""
)
# Positional arguments
parser.add_argument(
"input",
help="Path to input CSV file"
)
parser.add_argument(
"output",
help="Path to output JSON file"
)
# Optional arguments
parser.add_argument(
"--verbose", "-v",
action="store_true",
help="Enable verbose output"
)
parser.add_argument(
"--limit", "-l",
type=int,
default=None,
help="Limit number of records to process"
)
parser.add_argument(
"--skip-header",
action="store_true",
help="Skip the first row of the CSV"
)
return parser
def process_data(input_path: str, output_path: str,
verbose: bool = False, limit: int | None = None,
skip_header: bool = False) -> None:
"""Process data from input to output."""
if verbose:
print(f"Reading from: {input_path}")
print(f"Writing to: {output_path}")
if limit:
print(f"Limiting to {limit} records")
# Actual processing would go here
print("Processing complete!")
def main() -> int:
"""Main entry point."""
parser = create_parser()
args = parser.parse_args()
try:
process_data(
input_path=args.input,
output_path=args.output,
verbose=args.verbose,
limit=args.limit,
skip_header=args.skip_header
)
return 0 # Success
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
return 1 # Failure
if __name__ == "__main__":
sys.exit(main())
💡 The pattern of main() returning an exit code and sys.exit(main()) is professional practice. Exit code 0 means success, non-zero means failure.
# Integer argument
parser.add_argument("--count", type=int, default=10)
# Float argument
parser.add_argument("--threshold", type=float, default=0.5)
# File path (validates file exists)
parser.add_argument("--config", type=argparse.FileType("r"))
parser.add_argument(
"--format",
choices=["json", "csv", "xml"],
default="json",
help="Output format (default: json)"
)
parser.add_argument(
"--log-level",
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
default="INFO",
help="Logging level"
)
parser.add_argument(
"--api-key",
required=True,
help="API key (required)"
)
Exit codes tell the calling process whether your script succeeded or failed.
import sys
def main():
# ... do work ...
if error_occurred:
print("Error: something went wrong", file=sys.stderr)
sys.exit(1) # Non-zero = failure
print("Success!")
sys.exit(0) # Zero = success
Common exit codes:
0: Success1: General error2: Command-line usage error⌨️ Hands-on: Create a script greet.py that accepts a --name argument (default: "World") and prints "Hello, {name}!". Add a --loud flag that makes it print in uppercase.
Sometimes you want to configure scripts via environment variables instead of command-line arguments.
💡 Why use environment variables?
- Security: Never hardcode API keys or passwords in your code. Using environment variables keeps secrets out of your git history.
- CI/CD: Cloud platforms and automation tools use environment variables to configure pipelines without changing the code.
- Defaults: They are perfect for setting "safe" defaults that can be overridden by explicit CLI arguments.
import os
import argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--api-key",
default=os.environ.get("API_KEY"),
help="API key (or set API_KEY environment variable)"
)
args = parser.parse_args()
if not args.api_key:
print("Error: API key required", file=sys.stderr)
sys.exit(1)
print(f"Using API key: {args.api_key[:4]}...")
if __name__ == "__main__":
main()
Usage:
# Via argument
python script.py --api-key secret123
# Via environment variable
export API_KEY=secret123
python script.py
# Inline (Linux/macOS)
API_KEY=secret123 python script.py
⚠️ Never print full API keys or passwords! The example above only prints the first 4 characters.
print(..., file=sys.stderr)-help (argparse does this automatically)if __name__ == "__main__" to allow importing without runninginput() (asking for user typing) bad for automated data pipelines?0 usually signal to the operating system?argparse help other people use your script?Next lesson: Reading Errors and Debugging