Week 2 - Structuring Data Pipelines
Introduction to Data Pipelines
Configuration & Secrets (.env)
Separation of Concerns (I/O vs Logic)
Linting and Formatting with Ruff
Assignment: Refactoring to a Clean Pipeline
In the Core program you used Prettier to auto-format JavaScript. Python has its own tools for this. This chapter introduces ruff, a fast linter and formatter that catches bugs and keeps your code consistent.
By the end of this chapter, you should be able to lint and format your code with ruff, configure basic rules, and set up your editor to format on save.
When you work in a team, everyone writes code slightly differently: different indentation, different import ordering, different quoting styles. These differences create noisy diffs in pull requests and hide real changes behind style changes.
A formatter rewrites your code to a consistent style automatically. A linter goes further: it catches potential bugs, unused imports, and patterns that often lead to errors.
Ruff does both. It replaces older tools like flake8 (linter), black (formatter), and isort (import sorter) in a single tool.
Add ruff as a dev dependency. It is not part of your pipeline code, so it does not belong in your production dependencies.
pip install ruff
If you use uv:
uv add --dev ruff
ruff checkRun the linter on your source code:
ruff check src/
Ruff reports issues with a rule code (e.g. F401 for unused imports) and a description. Fix them manually, or let ruff fix what it can:
ruff check src/ --fix
<aside>
⌨️ Hands on: Run ruff check on your Week 2 code. Fix any issues it reports. Note which rule codes appear most often.
</aside>
ruff formatThe formatter rewrites your files to a consistent style (line length, quotes, trailing commas):
# Preview what would change
ruff format src/ --check
# Apply formatting
ruff format src/
<aside>
⌨️ Hands on: Run ruff format --check on your code. If it reports changes, run ruff format and inspect the diff with git diff.
</aside>
Ruff works out of the box, but you can customize it in pyproject.toml:
[tool.ruff]
line-length = 100
[tool.ruff.lint]
select = ["E", "F", "I"] # Errors, pyflakes, isort
ignore = ["E501"] # Skip line-length warnings
Common rule sets:
E: style errors (PEP 8)F: logical errors (unused imports, undefined names)I: import sorting<aside> 💡 Start with the defaults. Add rules as your project grows. Do not spend time configuring every option upfront.
</aside>
Set up VS Code to lint and format on save so you never commit unformatted code:
charliermarsh.ruff).{
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll.ruff": "explicit",
"source.organizeImports.ruff": "explicit"
}
}
}
This formats your code and fixes lint issues every time you save a Python file.
<aside> ⌨️ Hands on: Install the Ruff extension, add the settings above, and save a Python file. Confirm it auto-formats.
</aside>
That same auto-format-on-save habit should feel familiar from earlier.
<aside> 📘 Core program connection: In the Core program you set up Prettier for JavaScript auto-formatting. Ruff is the Python equivalent: same idea, different language.
</aside>
Run ruff as part of your development routine:
ruff check and ruff format --check to verify.<aside> 🤓 Curious Geek: Why ruff is so fast
Ruff is written in Rust, not Python. It can lint an entire project in milliseconds where older Python-based tools take seconds. This makes it practical to run on every save without slowing you down.
</aside>
That speed has driven a quiet consolidation across the Python ecosystem: large projects have replaced their three-tool stacks with one ruff configuration.
<aside>
💡 In the wild: pandas, FastAPI, and Pydantic all migrated from black + flake8 + isort to a single ruff configuration in 2023-2024. The migration is one entry in pyproject.toml and the daily CI run drops from minutes to seconds.
</aside>
The speed makes one specific habit cheap: looking up rule codes you don't recognise.
<aside>
💡 Using AI to help: When ruff reports a rule code you don't recognise (E501, B007, RUF001, ...), paste the code into an LLM and ask what it means and why it's flagged. Then verify the explanation against the ruff rule reference: LLMs are usually right on the common rules and confidently wrong on the obscure ones.
</aside>
A deliberately broken Python file with 5 ruff violations across 5 different rule codes is published as a gist. Download it, run ruff check, identify each rule, and fix them one at a time.
pip install ruff
curl -L <https://gist.githubusercontent.com/lassebenni/38c2e8fb384302cdcd2d0ecc82120b16/raw/messy.py> -o messy.py
ruff check messy.py
Fix one violation, re-run, watch the count drop. When stuck, compare against the reference fix in the repo: but only after you have tried it yourself.
<aside>
💡 The point is the loop: edit → ruff check → see fewer issues → repeat. That tight feedback is what makes ruff worth wiring into your editor.
</aside>
[tool.ruff] section to your pyproject.toml with a custom line-length. Confirm ruff format --check reports no changes after a fresh ruff format run.ruff check and ruff format?--fix flag do when running ruff check?[tool.ruff.lint] section in pyproject.toml control, and why is it kept separate from [tool.ruff]?ruff format and ruff check --fix are both available, which would you run first and why?Next up: Practice, where you apply every concept from this week to small focused exercises before tackling the assignment.
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with ❤️ by the HackYourFuture community · Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.