Week 2 - Structuring Data Pipelines

Introduction to Data Pipelines

Configuration & Secrets (.env)

Separation of Concerns (I/O vs Logic)

OOP vs Functional Programming

Dataclasses for Data Objects

Functional Composition

Testing with Pytest

Linting and Formatting with Ruff

Practice

Assignment: Refactoring to a Clean Pipeline

Gotchas & Pitfalls

Lesson Plan

Linting and Formatting with Ruff

In the Core program you used Prettier to auto-format JavaScript. Python has its own tools for this. This chapter introduces ruff, a fast linter and formatter that catches bugs and keeps your code consistent.

By the end of this chapter, you should be able to lint and format your code with ruff, configure basic rules, and set up your editor to format on save.

Concepts

Why lint and format?

When you work in a team, everyone writes code slightly differently: different indentation, different import ordering, different quoting styles. These differences create noisy diffs in pull requests and hide real changes behind style changes.

A formatter rewrites your code to a consistent style automatically. A linter goes further: it catches potential bugs, unused imports, and patterns that often lead to errors.

Ruff does both. It replaces older tools like flake8 (linter), black (formatter), and isort (import sorter) in a single tool.

Install ruff

Add ruff as a dev dependency. It is not part of your pipeline code, so it does not belong in your production dependencies.

pip install ruff

If you use uv:

uv add --dev ruff

Linting with ruff check

Run the linter on your source code:

ruff check src/

Ruff reports issues with a rule code (e.g. F401 for unused imports) and a description. Fix them manually, or let ruff fix what it can:

ruff check src/ --fix

<aside> ⌨️ Hands on: Run ruff check on your Week 2 code. Fix any issues it reports. Note which rule codes appear most often.

</aside>

Formatting with ruff format

The formatter rewrites your files to a consistent style (line length, quotes, trailing commas):

# Preview what would change
ruff format src/ --check

# Apply formatting
ruff format src/

<aside> ⌨️ Hands on: Run ruff format --check on your code. If it reports changes, run ruff format and inspect the diff with git diff.

</aside>

Configuring ruff

Ruff works out of the box, but you can customize it in pyproject.toml:

[tool.ruff]
line-length = 100

[tool.ruff.lint]
select = ["E", "F", "I"]  # Errors, pyflakes, isort
ignore = ["E501"]          # Skip line-length warnings

Common rule sets:

<aside> 💡 Start with the defaults. Add rules as your project grows. Do not spend time configuring every option upfront.

</aside>

Editor integration

Set up VS Code to lint and format on save so you never commit unformatted code:

  1. Install the Ruff extension (charliermarsh.ruff).
  2. Open VS Code settings (JSON) and add:
{
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
      "source.fixAll.ruff": "explicit",
      "source.organizeImports.ruff": "explicit"
    }
  }
}

This formats your code and fixes lint issues every time you save a Python file.

<aside> ⌨️ Hands on: Install the Ruff extension, add the settings above, and save a Python file. Confirm it auto-formats.

</aside>

<aside> 📘 Core program connection: In the Core program you set up Prettier for JavaScript auto-formatting. Ruff is the Python equivalent: same idea, different language.

</aside>

Linting in your workflow

Run ruff as part of your development routine:

  1. While coding: editor integration catches issues on save.
  2. Before committing: run ruff check and ruff format --check to verify.
  3. In CI (Week 5): the same commands run automatically on every push.

<aside> 🤓 Curious Geek: Why ruff is so fast

Ruff is written in Rust, not Python. It can lint an entire project in milliseconds where older Python-based tools take seconds. This makes it practical to run on every save without slowing you down.

</aside>

Exercises

  1. Run ruff check on your Week 2 code and fix all reported issues.
  2. Run ruff format and review the changes it makes.
  3. Add a [tool.ruff] section to your pyproject.toml with a custom line length.
  4. Set up the Ruff VS Code extension and verify format-on-save works.
  5. Explain the difference between a linter and a formatter.

🧠 Knowledge Check

  1. What is the difference between ruff check and ruff format?
  2. Why should ruff be a dev dependency, not a production dependency?
  3. What does the --fix flag do when running ruff check?

Extra reading


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.