Azure Setup and Account Access
Week 1 Assignment: The Data Cleaning Pipeline
Career relevance: Week 1 in the NL data job market
Going Further: Optional Deep Dives
Almost every real program needs to work with files: reading configuration, processing data, or saving results.
In Python, file operations are built in. You don't need to install any extra libraries to get started. In this chapter, you'll learn how to:
The built-in function to work with files is open().
file = open("example.txt", "r")
This opens a file called example.txt in read mode: the second argument is the file mode, which controls whether you can read from, write to, or append to the file. "r" opens for reading; "w" opens for writing and erases the existing contents; "a" opens for writing and appends. Files are system resources: if you forget to close them, you can cause bugs or resource leaks.
with Statement (Recommended)Python provides a safer pattern using the with statement. It automatically closes the file for you, even if your code crashes inside the block. Any object that implements this auto-cleanup protocol is called a context manager: file objects are the canonical example, but database connections, locks, and temp directories all implement the same pattern.
with open("example.txt", "r") as file:
content = file.read()
print(content)
Once the with block ends, the file is closed, no matter what. You should always prefer this pattern.
When opening a file, you must specify how you want to use it.
Common modes:
| Mode | Meaning |
|---|---|
"r" |
Read (file must exist) |
"w" |
Write (overwrites file if it exists) |
"a" |
Append (adds to end of file) |
"x" |
Create (fails if file exists) |
Example:
with open("output.txt", "w") as file:
file.write("Hello, world!\n")
<aside>
⚠️ Be careful with "w": it deletes the file contents if the file already exists.
</aside>
with open("example.txt", "r") as file:
content = file.read()
print(content)
This loads the entire file into memory. Fine for small files, but not ideal for large ones.
with open("example.txt", "r") as file:
for line in file:
print(line.strip())
This is memory-efficient and very common in real programs.
with open("notes.txt", "w") as file:
file.write("First line\n")
file.write("Second line\n")
with open("notes.txt", "a") as file:
file.write("Another line\n")
Hard-coding file paths like "data/file.csv" can cause problems across operating systems: macOS and Linux use /, Windows uses \. Python's standard library includes pathlib, which makes paths safer and clearer: Path("data") / "file.csv" automatically picks the right separator on every OS, and the resulting Path object exposes operations like .exists(), .parent, .read_text(), and .glob() that you would otherwise stitch together from os.path.join, os.path.basename, and friends.
from pathlib import Path
data_dir = Path("data")
file_path = data_dir / "example.txt"
with open(file_path, "r") as file:
print(file.read())
This works on macOS, Linux, and Windows.
CSV (Comma-Separated Values) files are common for data exchange. Python includes a built-in csv module.
users.csv)id,name,email
1,Alice,[email protected]
2,Bob,[email protected]
import csv
with open("users.csv", "r", newline="") as file:
reader = csv.DictReader(file)
for row in reader:
print(row)
Output:
{'id': '1', 'name': 'Alice', 'email': '[email protected]'}
{'id': '2', 'name': 'Bob', 'email': '[email protected]'}
Notice:
JSON is a very common output format for APIs, configuration, and data exchange. Python includes the json module.
import json
data = {
"id":1,
"name":"Alice",
"active":True
}
with open("user.json", "w") as file:
json.dump(data, file, indent=2)
This creates a nicely formatted file:
{
"id":1,
"name":"Alice",
"active":true
}
This example reads a CSV file and writes a JSON file.
import csv
import json
from pathlibimport Path
input_path = Path("users.csv")
output_path = Path("users.json")
users = []
with open(input_path, "r", newline="") as file:
reader = csv.DictReader(file)
for rowin reader:
users.append(row)
with open(output_path, "w") as file:
json.dump(users, file, indent=2)
This is a common real-world pattern.
FileNotFoundError: [Errno 2] No such file or directory
Usually means: