Week 2 - Structuring Data Pipelines

Introduction to Data Pipelines

Configuration & Secrets (.env)

Separation of Concerns (I/O vs Logic)

OOP vs Functional Programming

Dataclasses for Data Objects

Functional Composition

Testing with Pytest

Linting and Formatting with Ruff

Practice

Gotchas & Pitfalls

Assignment: Refactoring to a Clean Pipeline

Linting and Formatting with Ruff

In the Core program you used Prettier to auto-format JavaScript. Python has its own tools for this. This chapter introduces ruff, a fast linter and formatter that catches bugs and keeps your code consistent.

By the end of this chapter, you should be able to lint and format your code with ruff, configure basic rules, and set up your editor to format on save.

Why lint and format?

When you work in a team, everyone writes code slightly differently: different indentation, different import ordering, different quoting styles. These differences create noisy diffs in pull requests and hide real changes behind style changes.

A formatter rewrites your code to a consistent style automatically. A linter goes further: it catches potential bugs, unused imports, and patterns that often lead to errors.

Ruff does both. It replaces older tools like flake8 (linter), black (formatter), and isort (import sorter) in a single tool.

Install ruff

Add ruff as a dev dependency. It is not part of your pipeline code, so it does not belong in your production dependencies.

pip install ruff

If you use uv:

uv add --dev ruff

Linting with ruff check

Run the linter on your source code:

ruff check src/

Ruff reports issues with a rule code (e.g. F401 for unused imports) and a description. Fix them manually, or let ruff fix what it can:

ruff check src/ --fix

<aside> ⌨️ Hands on: Run ruff check on your Week 2 code. Fix any issues it reports. Note which rule codes appear most often.

</aside>

Formatting with ruff format

The formatter rewrites your files to a consistent style (line length, quotes, trailing commas):

# Preview what would change
ruff format src/ --check

# Apply formatting
ruff format src/

<aside> ⌨️ Hands on: Run ruff format --check on your code. If it reports changes, run ruff format and inspect the diff with git diff.

</aside>

Configuring ruff

Ruff works out of the box, but you can customize it in pyproject.toml:

[tool.ruff]
line-length = 100

[tool.ruff.lint]
select = ["E", "F", "I"]  # Errors, pyflakes, isort
ignore = ["E501"]          # Skip line-length warnings

Common rule sets:

<aside> 💡 Start with the defaults. Add rules as your project grows. Do not spend time configuring every option upfront.

</aside>

Editor integration

Set up VS Code to lint and format on save so you never commit unformatted code:

  1. Install the Ruff extension (charliermarsh.ruff).
  2. Open VS Code settings (JSON) and add:
{
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
      "source.fixAll.ruff": "explicit",
      "source.organizeImports.ruff": "explicit"
    }
  }
}

This formats your code and fixes lint issues every time you save a Python file.

<aside> ⌨️ Hands on: Install the Ruff extension, add the settings above, and save a Python file. Confirm it auto-formats.

</aside>

That same auto-format-on-save habit should feel familiar from earlier.

<aside> 📘 Core program connection: In the Core program you set up Prettier for JavaScript auto-formatting. Ruff is the Python equivalent: same idea, different language.

</aside>

Linting in your workflow

Run ruff as part of your development routine:

  1. While coding: editor integration catches issues on save.
  2. Before committing: run ruff check and ruff format --check to verify.
  3. In CI (covered in the cloud weeks): the same commands run automatically on every push.

<aside> 🤓 Curious Geek: Why ruff is so fast

Ruff is written in Rust, not Python. It can lint an entire project in milliseconds where older Python-based tools take seconds. This makes it practical to run on every save without slowing you down.

</aside>

That speed has driven a quiet consolidation across the Python ecosystem: large projects have replaced their three-tool stacks with one ruff configuration.

<aside> 💡 In the wild: pandas, FastAPI, and Pydantic all migrated from black + flake8 + isort to a single ruff configuration in 2023-2024. The migration is one entry in pyproject.toml and the daily CI run drops from minutes to seconds.

</aside>

The speed makes one specific habit cheap: looking up rule codes you don't recognise.

<aside> 💡 Using AI to help: When ruff reports a rule code you don't recognise (E501, B007, RUF001, ...), paste the code into an LLM and ask what it means and why it's flagged. Then verify the explanation against the ruff rule reference: LLMs are usually right on the common rules and confidently wrong on the obscure ones.

</aside>

⌨️ Hands on: fix the messy file

A deliberately broken Python file with 5 ruff violations across 5 different rule codes is published as a gist. Download it, run ruff check, identify each rule, and fix them one at a time.

pip install ruff
curl -L <https://gist.githubusercontent.com/lassebenni/38c2e8fb384302cdcd2d0ecc82120b16/raw/messy.py> -o messy.py
ruff check messy.py

Fix one violation, re-run, watch the count drop. When stuck, compare against the reference fix in the repo: but only after you have tried it yourself.

<aside> 💡 The point is the loop: edit → ruff check → see fewer issues → repeat. That tight feedback is what makes ruff worth wiring into your editor.

</aside>

Exercises

  1. Add a [tool.ruff] section to your pyproject.toml with a custom line-length. Confirm ruff format --check reports no changes after a fresh ruff format run.
  2. Explain the difference between a linter and a formatter to someone who has never used either: one short paragraph, two concrete examples (one for each).

🧠 Knowledge Check

Extra reading


Next up: Practice, where you apply every concept from this week to small focused exercises before tackling the assignment.


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.