Career relevance: Week 3

Indicative as of May 2026: see Sources for current numbers.

This page answers two questions students ask every week: why am I learning this, and how does it help me find a job?

It is scoped to Week 3 content (data ingestion: REST APIs with requests, retry logic and exponential backoff, structured error handling, reading CSV / JSON / Parquet, Pydantic validation, SQLite writes with parameterized queries and upserts). Other weeks' career pages each cover their week's tools, not these. Generic NL junior-data career content (salary bands, day-to-day work, what employers do not expect from juniors) lives in one shared page across the curriculum and is not repeated here.

The numbers below are a rough reading of public NL postings as of May 2026. They are indicative, not measured. A separate project crawls Dutch data postings and will replace the qualitative claims here with measured percentages once the dataset is ready; placeholders are marked ~XX% for that swap.

How "data ingestion" shows up in NL postings

Ingestion is the part of data engineering that connects your code to the real world. NL postings phrase it in many ways ("integrates with external sources", "builds API connectors", "validates incoming data"), but the underlying skills are the same five: HTTP clients, retry logic, validation, file formats, and database writes. Week 3 covers all five.

Role	Postings expecting Week 3's ingestion stack	What the posting expects
Data Engineer (DE)	~XX% (high, likely 80%+)	"Builds and maintains ingestion pipelines from REST APIs and files", "implements retry logic and error handling", "validates incoming data before it reaches the warehouse". Week 3 is the bread-and-butter of junior DE work; postings expect this on day one and do not teach it in onboarding.
Analytics Engineer (AE)	~XX% (mid, likely 40-50%)	Lighter and skewed toward the validation side. Most AE work is dbt / SQL transformations, but utility scripts that fetch reference data (currency rates, country lookups) from an API and load to the warehouse are a routine AE task. "Comfortable with Python and APIs" usually means the Week 3 register: read a JSON response, validate it, write to a table.
Data Scientist (DS)	~XX% (mid, likely 30-40%)	Notebook-DS roles ingest from APIs to feed feature pipelines; ML-engineer-DS roles routinely build the ingestion side themselves. Postings list "Python, requests, Pydantic" or "experience with REST APIs" on the must-haves. Less common: deep SQL writes. The DS side leans on the API and validation pieces of Week 3 more than the database piece.
Data Analyst (DA)	~XX% (low, likely 10-20%)	Rare. Most DA work consumes data that someone else ingested. Postings occasionally mention "comfortable pulling from an API" for self-service contexts. When they do, the bar is Ch2 level (one `requests.get` + JSON parsing), not the full retry-and-validation stack.

The directional shape: Week 3 maps tightly onto the junior DE role and the ingestion-heavy slice of every other data role. If a posting says "builds data pipelines" without further qualification, this is what they mean by "the input side."

The Week 3 stack vs alternatives in NL

The chapters teach the standard-library and Pydantic defaults. NL postings name a wider range of alternatives once you reach medior level.

Concept	Tool taught	Common NL alternatives	Practical implication
HTTP client	`requests`	`httpx` (async + HTTP/2), `aiohttp`, plain `urllib`	`requests` is the universal floor. Postings that name a specific client usually mean `httpx`, because async ingestion against many sources is the obvious next step. The mental model (status codes, headers, params, timeouts) transfers 1:1.
Retry logic	manual `time.sleep(2 ** n)` retry loop	`tenacity` decorators, `urllib3.Retry` adapter, framework-level retries (Airflow / Prefect task retries)	Hand-rolled retries clear the junior bar. Production teams use `tenacity` or a framework retry to avoid scattering retry logic across every source. The pattern (transient vs permanent, exponential backoff, jitter) is what they really test for in interviews.
Validation	Pydantic v2 with `BaseModel`, `Field`, `@field_validator`	`pydantic` is the de-facto NL default; alternatives are `dataclasses` + manual `__post_init__`, `marshmallow`, `cerberus`, `attrs` + custom validators	Postings increasingly name Pydantic explicitly. If they do not, "validates data against a schema" is what they mean. Knowing Pydantic v2 syntax (not v1's `@validator` / `.dict()`) is the medior-level expectation.
File formats	stdlib `csv`, `json`; `pyarrow` / `pandas` for Parquet	`pandas.read_csv` / `read_json` / `read_parquet`, `polars`, `duckdb` (in-process SQL over Parquet)	The stdlib `csv` and `json` modules are the right place to start. Once volumes grow past a few hundred MB, postings expect `pandas` (Week 4) or `polars`. `duckdb` shows up in NL postings for analytics-engineering roles that want SQL semantics over Parquet without a warehouse.
Declarative ingestion	(not taught this week)	`dlt` (declarative load tool, very active NL community), `Airbyte` (connector platform), `Singer` / `Meltano`, `Fivetran` (managed SaaS)	Week 3 builds ingestion by hand on purpose. NL teams adopt `dlt` or `Airbyte` once they have 5+ sources to maintain; the SQL and HTTP patterns you learn here are what those frameworks generate under the hood.
Database writes	SQLite + `?` parameters + `ON CONFLICT DO UPDATE`	PostgreSQL with `psycopg` / SQLAlchemy, Azure Database for PostgreSQL (the HYF class DB), MotherDuck / DuckDB for analytical writes	SQLite is the training-wheels choice. Almost every NL DE posting names PostgreSQL specifically; Azure shops list Azure Database for PostgreSQL or Azure SQL. The parameterized-query and upsert patterns you learned transfer unchanged: the connection library is the only thing that swaps.

What this means for your CV: lead with "Python ingestion (REST APIs with requests + retry, Pydantic validation, SQLite / PostgreSQL writes with upserts)" as a single phrase, not a checklist of modules.

Junior vs medior expectations

Postings phrase the ingestion bar at three levels:

Junior: "Comfortable consuming REST APIs", "writes basic retry logic", "knows what Pydantic is". Your Week 3 work clears this bar: you can fetch from an API with a timeout, retry on transient failure, validate every record with a Pydantic model, and write the survivors to a database idempotently.
Medior: "Designs ingestion patterns for new sources", "owns retry budgets and SLAs for upstream APIs", "evaluates dlt vs custom code with the team". Goes beyond Week 3: typically 1+ year of shipping ingestion code that runs unattended on a schedule. Also: schema-evolution handling, dead-letter routing, observability for ingestion lag.
Senior / Lead: Defines the ingestion architecture across the platform; sets data-contract policy; out of scope for Week 3.

Week 3 does not yet practice async ingestion (httpx + asyncio), production retry libraries (tenacity), declarative frameworks (dlt), or non-SQLite databases. Those are the bridge from junior to medior and show up later in the track (the cloud-databases week swaps SQLite for Postgres; the orchestration week introduces task-level retries via Airflow). Week 3 is the foundation a hiring manager assumes when they invite a junior to a take-home assignment.

How Week 3 work signals on a CV

Strong line a student can copy-adapt:

Built a resilient weather-data ingestion pipeline in Python: fetched hourly readings from the Open-Meteo REST API with requests (timeout=10, exponential-backoff retry on ConnectionError and Timeout, classified transient vs permanent HTTP errors), normalized CSV and JSON inputs into a shared schema, validated every record with a Pydantic v2 BaseModel (Field(ge=-90, le=60) constraints, @field_validator for timestamp parsing), and wrote survivors to SQLite using parameterized INSERT ... ON CONFLICT(station, timestamp) DO UPDATE SET for idempotent re-runs. Accumulated per-record errors into a structured report instead of crashing on the first bad row.

Recruiter keywords this carries: Python, requests, retry, exponential backoff, transient vs permanent errors, Pydantic, validation, SQLite, ON CONFLICT, upsert, idempotency, ETL, ingestion, CSV, JSON, Parquet, REST API, parameterized queries.

Weaker alternative for contrast (avoid):

Wrote a Python script that downloads weather data and stores it in a database.

The weaker version drops every recruiter keyword and could be claimed by anyone who has ever run pip install requests. The strong version names the specific patterns (retry classification, Pydantic constraints, ON CONFLICT upserts, error accumulation) that signal the candidate has built ingestion code that survives Monday morning when the upstream API is flaky.

Interview phrasing for the Week 3 assignment

When asked "tell me about a Python project you have built", the Week 3 assignment gives you a 90-second answer:

I built an ingestion pipeline for a fictional weather-data company. It pulls hourly readings from the Open-Meteo REST API and a CSV partner feed, validates every record against a Pydantic model, and writes the survivors to SQLite. Three things I am proud of. First, it survives transient network failures: the API client classifies errors as transient or permanent, retries transient ones with exponential backoff, and gives up cleanly on permanent ones. Second, the writes are idempotent: re-running the same input produces the same database state, no duplicates, because I used INSERT ... ON CONFLICT DO UPDATE SET keyed on (station, timestamp). Third, the pipeline never crashes on a bad row: I accumulate per-record validation errors into a structured report so I can debug after the run instead of losing every good record before the bad one.

This answer hits six interview-relevant concepts in ninety seconds: ingestion, error classification, retry / backoff, validation, idempotency, and error accumulation. Tailor it to whatever the interviewer brings up in follow-up.

Two honest follow-ups if asked "what would you do differently?":

"My retry logic is hand-written. In a real project I would reach for tenacity so the retry policy is a decorator on the fetch function instead of an inline for attempt in range(...) loop. That makes the retry policy visible at the call site and lets me change it in one place."
"My writes go to SQLite. In a team setting I would swap the connection layer for PostgreSQL (Azure Database for PostgreSQL in our shop), keep the same ON CONFLICT upsert pattern, and add a dead-letter table for the records the Pydantic validator rejects, so the next run can replay them after a schema fix."

What Week 3 does not make you

Week 3 builds the ingestion side, not the whole pipeline. After this week you are not yet:

A scheduled-pipeline operator. Your ingestion runs when you type python pipeline.py. Production ingestion runs on a schedule (Airflow, Prefect, Dagster), retries failed tasks at the task level, alerts on missed SLAs, and shares state across runs through a metadata store. Those concerns arrive in the orchestration week later in the track.
A streaming engineer. You ingest batches: every record in memory, written in one transaction. Streaming ingestion (Kafka, Event Hubs, Kinesis) processes one record at a time forever, and is a different mental model. Not on the Week 3 path.
A schema-contracts engineer. Pydantic validates incoming records against a schema you wrote yourself. Production data platforms use contract registries (Schema Registry, Great Expectations, dbt data contracts) where producers and consumers negotiate the schema explicitly. Mentioned in passing later in the track; not practised in Week 3.

Naming these honestly in an interview ("I have built unattended-style ingestion against a single source; I have not yet wired it into a scheduler or set up dead-letter replay") signals more maturity than overclaiming.

Sources

Indeed.nl: "Python data engineer" search: for posting frequency and phrasing samples on the NL DE market as of May 2026.
LinkedIn NL: "Junior data engineer" postings: junior-level posting volume and "must-have" vs "nice-to-have" language.
dlt community: qualitative reads on how Python ingestion frameworks are landing in European teams (community channels, GitHub discussions, conference talks).
Honeypot: tech-jobs marketplace with NL-specific market reads on data-engineering demand and tooling-share signals.
Pydantic GitHub stars and adoption surveys: Pydantic v2 adoption signal across the Python ecosystem.

Mark this page indicative, not statistical. The ~XX% figures will be replaced with measured percentages once the postings-crawler project ships.

<aside> 💭 For generic NL junior data-career content (salary bands, day-to-day work, what employers do not expect from any junior), one shared page across all weeks is the right home. That page does not exist yet; for now, treat this page as Week-3-specific only.

</aside>

The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.