Week 2 - Structuring Pipelines

Introduction to Data Pipelines

Configuration & Secrets (.env)

Separation of Concerns

OOP vs Functional Programming

Dataclasses for Data Objects

Functional Composition

Testing with Pytest

Linting and Formatting with Ruff

Practice

Gotchas & Pitfalls

Week 2 Assignment: Clean Pipeline

Week 2 Glossary

Going Further

Career relevance: Week 2

Week 2 Kickoff Slides

Career relevance: Week 2

Indicative as of May 2026: see Sources for current numbers.

This page answers two questions students ask every week: why am I learning this, and how does it help me find a job?

It is scoped to Week 2 content (structuring Python pipelines: configuration management, separation of concerns, dataclasses, functional composition, pytest, and ruff). Other weeks' career pages each cover their week's tools, not these. Generic NL junior-data career content (salary bands, day-to-day work, what employers do not expect from juniors) lives in one shared page across the curriculum and is not repeated here.

The numbers below are a rough reading of public NL postings as of May 2026. They are indicative, not measured. A separate project crawls Dutch data postings and will replace the qualitative claims here with measured percentages once the dataset is ready; placeholders are marked ~XX% for that swap.

How "structured Python" shows up in NL postings

Almost every Python-touching NL data posting expects the patterns from this week: they just rarely name them explicitly. "Clean code", "modular design", "writes tests", "production-ready Python" all map to what you practiced. The frequency-by-role matters less here than the language postings use.

Role Postings expecting structured-Python skills What the posting expects
Data Engineer (DE) ~85% Hands-on. "Production Python", "modular pipeline code", "writes tests for transformation logic", "uses environment variables for config". The Week 2 patterns are table-stakes, not differentiators.
Analytics Engineer (AE) ~30% Lighter expectation: most AE work happens in dbt/SQL, but utility scripts and CI tooling still ride on Python. "Comfortable with Python" usually means the Week 2 register.
Data Scientist (DS) ~50% Notebook-first roles often skip these patterns entirely; product-DS roles (recommender systems, ML pipelines) expect them at the DE level. The split is roughly notebook-DS vs ML-engineer-DS.
Data Analyst (DA) ~10% Rarely required. Some scale-ups hire analysts who maintain shared Python utilities; most do not.

The directional shape: DE roles assume you write structured Python; AE/DS roles assume you can read it; DA roles do not require it but it is a CV differentiator. If you are aiming at DE postings, this week is the floor: recruiters do not ask "do you separate I/O from logic?" the way they ask "do you know dbt?", but interviewers will see whether you do it in your code-screen submission.

The Week 2 stack vs alternatives in NL

The tools and patterns this week introduces have NL-market-standard alternatives. Most postings do not name a specific library; they name the pattern and expect you to bring tools to it.

Concept Tool taught Common NL alternatives Practical implication
Config & secrets python-dotenv + os.environ pydantic-settings, dynaconf, AWS Secrets Manager, Azure Key Vault All build on the same env-var contract. NL teams on Azure typically layer Key Vault on top in production (covered later in the track). The .env pattern is what runs locally.
Data modelling @dataclass Pydantic (very common in production, ~XX% of postings name it explicitly when validation comes up), attrs, plain dicts Pydantic is widely used for validated data models in 2026 NL postings. Dataclasses are the gateway pattern; the next week introduces Pydantic on top.
Functional composition hand-rolled pipe() toolz, returns, pandas method chaining Most production pipelines compose with simple variable rebinding (the way you learned), not with a pipe library. Pandas chaining is the same pattern with different syntax (introduced when pandas comes in).
Testing pytest unittest (legacy), nose (deprecated) pytest is the standard. unittest shows up in older codebases at banks and government employers.
Linting/formatting ruff black + flake8 + isort (the stack ruff replaced) ruff has effectively won the NL market in the last 18 months; postings still occasionally name black because the description is older than the codebase.

What this means for your CV: lead with "structured Python (config, dataclasses, pytest, ruff)" as a single phrase. Recruiters scan for the keywords; interviewers care that you can defend why each pattern is there.

Junior vs medior expectations

Postings phrase the expectation at three levels:

The chapter does not yet practice Pydantic, mypy, or full dependency management: the next week introduces Pydantic, and the type-checking and packaging pieces come later. That progression is intentional: Week 2 is the structural foundation everything else builds on.

How Week 2 work signals on a CV

Strong line a student can copy-adapt:

Refactored a single-script CSV pipeline into a modular Python project: separated I/O from transform logic, modelled Transaction records as a @dataclass with __post_init__ validation, composed transforms as pure functions returning new collections (no mutation), wrote a pytest suite covering the pure transforms with @pytest.fixture and @pytest.mark.parametrize, loaded configuration from .env via a centralised config.py, and enforced style with ruff check + ruff format.

Recruiter keywords this carries: Python, dataclasses, pytest, fixtures, parametrize, pure functions, separation of concerns, dependency injection, dotenv, ruff, modular design.

Weaker alternative for contrast (avoid):

Wrote some Python scripts that clean CSV files.

The weaker version drops every recruiter keyword and could be claimed by anyone who has ever opened a CSV in a notebook. The strong version names the specific patterns and the specific tools.

Interview phrasing for the Week 2 assignment

Three sentences that cover the assignment cleanly when an interviewer asks "tell me about a project you have built":

  1. "I refactored a 'god script' that did read, transform and write in one function into a modular pipeline: a thin I/O layer for CSV reading and writing, a Transaction dataclass for the row schema, and a chain of pure transform functions for the business rules. The point was that I could test each transform without touching the filesystem."
  2. "For testing, I used pytest with fixtures for shared sample rows and parametrize for edge cases (empty list, single row, negative price). The pure-function design meant the tests were fast and explicit: no setup, no teardown, just inputs and expected outputs."
  3. "Configuration lived in a .env file loaded by a centralised config.py that raised ValueError if a required variable was missing. That pattern was deliberate: I wanted the pipeline to fail loudly at startup rather than crash mid-run with a NoneType error."

Two honest follow-ups if asked "what would you do differently?":

What Week 2 does not make you

Week 2 is the foundation, not the finished article. After this week you are not yet:

These are the senior-shaped skills the chapter does not yet make you qualified for. Naming them honestly in an interview ("I know where my skills end") is more impressive than a junior overclaim.

Sources

Mark this page indicative, not statistical. Numbers will be replaced with measured percentages once the postings-crawler project ships.


<aside> 💭 For generic NL junior data-career content (salary bands, day-to-day work, what employers do not expect from any junior), one shared page across all weeks is the right home. That page does not exist yet; for now, treat this page as Week-2-specific only.

</aside>


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.