Week 2 - Structuring Data Pipelines
Introduction to Data Pipelines
Configuration & Secrets (.env)
Separation of Concerns (I/O vs Logic)
Linting and Formatting with Ruff
Assignment: Refactoring to a Clean Pipeline
In the Core program you used Prettier to auto-format JavaScript. Python has its own tools for this. This chapter introduces ruff, a fast linter and formatter that catches bugs and keeps your code consistent.
By the end of this chapter, you should be able to lint and format your code with ruff, configure basic rules, and set up your editor to format on save.
When you work in a team, everyone writes code slightly differently: different indentation, different import ordering, different quoting styles. These differences create noisy diffs in pull requests and hide real changes behind style changes.
A formatter rewrites your code to a consistent style automatically. A linter goes further: it catches potential bugs, unused imports, and patterns that often lead to errors.
Ruff does both. It replaces older tools like flake8 (linter), black (formatter), and isort (import sorter) in a single tool.
Add ruff as a dev dependency. It is not part of your pipeline code, so it does not belong in your production dependencies.
pip install ruff
If you use uv:
uv add --dev ruff
ruff checkRun the linter on your source code:
ruff check src/
Ruff reports issues with a rule code (e.g. F401 for unused imports) and a description. Fix them manually, or let ruff fix what it can:
ruff check src/ --fix
<aside>
⌨️ Hands on: Run ruff check on your Week 2 code. Fix any issues it reports. Note which rule codes appear most often.
</aside>
ruff formatThe formatter rewrites your files to a consistent style (line length, quotes, trailing commas):
# Preview what would change
ruff format src/ --check
# Apply formatting
ruff format src/
<aside>
⌨️ Hands on: Run ruff format --check on your code. If it reports changes, run ruff format and inspect the diff with git diff.
</aside>
Ruff works out of the box, but you can customize it in pyproject.toml:
[tool.ruff]
line-length = 100
[tool.ruff.lint]
select = ["E", "F", "I"] # Errors, pyflakes, isort
ignore = ["E501"] # Skip line-length warnings
Common rule sets:
E: style errors (PEP 8)F: logical errors (unused imports, undefined names)I: import sorting<aside> 💡 Start with the defaults. Add rules as your project grows. Do not spend time configuring every option upfront.
</aside>
Set up VS Code to lint and format on save so you never commit unformatted code:
charliermarsh.ruff).{
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll.ruff": "explicit",
"source.organizeImports.ruff": "explicit"
}
}
}
This formats your code and fixes lint issues every time you save a Python file.
<aside> ⌨️ Hands on: Install the Ruff extension, add the settings above, and save a Python file. Confirm it auto-formats.
</aside>
<aside> 📘 Core program connection: In the Core program you set up Prettier for JavaScript auto-formatting. Ruff is the Python equivalent: same idea, different language.
</aside>
Run ruff as part of your development routine:
ruff check and ruff format --check to verify.<aside> 🤓 Curious Geek: Why ruff is so fast
Ruff is written in Rust, not Python. It can lint an entire project in milliseconds where older Python-based tools take seconds. This makes it practical to run on every save without slowing you down.
</aside>
ruff check on your Week 2 code and fix all reported issues.ruff format and review the changes it makes.[tool.ruff] section to your pyproject.toml with a custom line length.ruff check and ruff format?--fix flag do when running ruff check?The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

Found a mistake or have a suggestion? Let us know in the feedback form.