Azure Setup and Account Access
Data pipelines are typically run from the command line, not by clicking buttons in an IDE. In this section, you'll learn professional habits for running Python scripts and handling command-line arguments.
# Run a Python script
python script.py
# Explicitly use Python 3.11
python3.11 script.py
# Run a module (useful for packages)
python -m mypackage.script
# Pass arguments to your script
python process.py input.csv output.json
# With named flags
python process.py --input data.csv --output result.json --verbose
<aside> 💡 When your virtual environment is activated, python will point to the correct version. Always activate your venv before running scripts!
</aside>
argparse ModuleThe argparse module is Python's built-in way to handle command-line arguments professionally.
<aside> 💡 Why use argparse?
-help menu automatically based on your code.-verbose) and short-forms (like v) exactly like standard Linux/Mac tools.
</aside># process.py
import argparse
def main():
parser = argparse.ArgumentParser(
description="Process a data file."
)
parser.add_argument("input", help="Input file path")
parser.add_argument("output", help="Output file path")
args = parser.parse_args()
print(f"Input: {args.input}")
print(f"Output: {args.output}")
if __name__ == "__main__":
main()
Run it:
python process.py data.csv result.json
# Output:
# Input: data.csv
# Output: result.json
argparse generates help text automatically:
python process.py --help
# Output:
# usage: process.py [-h] input output
#
# Process a data file.
#
# positional arguments:
# input Input file path
# output Output file path
#
# optional arguments:
# -h, --help show this help message and exit
parser.add_argument("input_file", help="The file to process")
parser.add_argument("output_file", help="Where to save results")
Usage: python script.py input.csv output.json
parser.add_argument(
"--verbose", "-v",
action="store_true",
help="Enable verbose output"
)
parser.add_argument(
"--limit", "-l",
type=int,
default=100,
help="Maximum number of records (default: 100)"
)
Usage: python script.py input.csv output.json --verbose --limit 50
<aside> ⚠️ Optional arguments start with -- (or - for short form). Positional arguments don't have dashes.
</aside>
#!/usr/bin/env python3
"""
Data processing pipeline script.
Usage:
python pipeline.py input.csv output.json
python pipeline.py input.csv output.json --verbose --skip-header
"""
import argparse
import sys
def create_parser() -> argparse.ArgumentParser:
"""Create and configure the argument parser."""
parser = argparse.ArgumentParser(
description="Process CSV data and output JSON.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
python pipeline.py data.csv result.json
python pipeline.py data.csv result.json --verbose
python pipeline.py data.csv result.json --limit 100
"""
)
# Positional arguments
parser.add_argument(
"input",
help="Path to input CSV file"
)
parser.add_argument(
"output",
help="Path to output JSON file"
)
# Optional arguments
parser.add_argument(
"--verbose", "-v",
action="store_true",
help="Enable verbose output"
)
parser.add_argument(
"--limit", "-l",
type=int,
default=None,
help="Limit number of records to process"
)
parser.add_argument(
"--skip-header",
action="store_true",
help="Skip the first row of the CSV"
)
return parser
def process_data(input_path: str, output_path: str,
verbose: bool = False, limit: int | None = None,
skip_header: bool = False) -> None:
"""Process data from input to output."""
if verbose:
print(f"Reading from: {input_path}")
print(f"Writing to: {output_path}")
if limit:
print(f"Limiting to {limit} records")
# Actual processing would go here
print("Processing complete!")
def main() -> int:
"""Main entry point."""
parser = create_parser()
args = parser.parse_args()
try:
process_data(
input_path=args.input,
output_path=args.output,
verbose=args.verbose,
limit=args.limit,
skip_header=args.skip_header
)
return 0 # Success
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
return 1 # Failure
if __name__ == "__main__":
sys.exit(main())
<aside> 💡 The pattern of main() returning an exit code and sys.exit(main()) is professional practice. Exit code 0 means success, non-zero means failure.
</aside>
# Integer argument
parser.add_argument("--count", type=int, default=10)
# Float argument
parser.add_argument("--threshold", type=float, default=0.5)
# File path (validates file exists)
parser.add_argument("--config", type=argparse.FileType("r"))
parser.add_argument(
"--format",
choices=["json", "csv", "xml"],
default="json",
help="Output format (default: json)"
)
parser.add_argument(
"--log-level",
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
default="INFO",
help="Logging level"
)
parser.add_argument(
"--api-key",
required=True,
help="API key (required)"
)
Exit codes tell the calling process whether your script succeeded or failed.
import sys
def main():
# ... do work ...
if error_occurred:
print("Error: something went wrong", file=sys.stderr)
sys.exit(1) # Non-zero = failure
print("Success!")
sys.exit(0) # Zero = success
Common exit codes:
0: Success1: General error2: Command-line usage error<aside> ⌨️ Hands-on: Create a script greet.py that accepts a --name argument (default: "World") and prints "Hello, {name}!". Add a --loud flag that makes it print in uppercase.
</aside>
Sometimes you want to configure scripts via environment variables instead of command-line arguments.
<aside> 💡 Why use environment variables?
import os
import argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--api-key",
default=os.environ.get("API_KEY"),
help="API key (or set API_KEY environment variable)"
)
args = parser.parse_args()
if not args.api_key:
print("Error: API key required", file=sys.stderr)
sys.exit(1)
print(f"Using API key: {args.api_key[:4]}...")
if __name__ == "__main__":
main()
Usage:
# Via argument
python script.py --api-key secret123
# Via environment variable
export API_KEY=secret123
python script.py
# Inline (Linux/macOS)
API_KEY=secret123 python script.py
<aside> ⚠️ Never print full API keys or passwords! The example above only prints the first 4 characters.
</aside>
print(..., file=sys.stderr)-help (argparse does this automatically)if __name__ == "__main__" to allow importing without runninginput() (asking for user typing) bad for automated data pipelines?0 usually signal to the operating system?argparse help other people use your script?Next lesson: Reading Errors and Debugging