Azure Setup and Account Access
Week 1 Assignment: The Data Cleaning Pipeline
Career relevance: Week 1 in the NL data job market
Going Further: Optional Deep Dives
Indicative as of May 2026: see Sources for current numbers.
This page answers two questions students ask every week: why am I learning this, and how does it help me find a job?
It is scoped to Week 1 content (Python foundations: variables and types, control flow, functions and modules, type hints, CLI habits, errors and debugging, logging, file I/O, and Azure tenant access). Other weeks' career pages each cover their week's tools, not these. Generic NL junior-data career content (salary bands, day-to-day work, what employers do not expect from juniors) lives in one shared page across the curriculum and is not repeated here.
The numbers below are a rough reading of public NL postings as of May 2026. They are indicative, not measured. A separate project crawls Dutch data postings and will replace the qualitative claims here with measured percentages once the dataset is ready; placeholders are marked ~XX% for that swap.
Almost every NL data posting expects working Python. The variation is in what they call working: a Data Engineer posting expects scripts that other people run in production; a Data Analyst posting expects scripts that produce a chart in a notebook. Week 1 lands closer to the DE end (CLI, logging, file I/O are production patterns), but the underlying Python (functions, type hints, error handling) shows up in every role.
| Role | Postings expecting Week 1's foundations | What the posting expects |
|---|---|---|
| Data Engineer (DE) | ~95% | Hands-on. "Strong Python", "writes Python comfortably", "understands type hints", "comfortable with the command line". Week 1 is the floor: postings rarely teach this in onboarding. |
| Analytics Engineer (AE) | ~70% | Lighter, but still expected. Most AE work is dbt/SQL, but utility scripts and CI hooks are Python. "Comfortable with Python" usually means the Week 1 register: read a function, run a script from the terminal, fix a KeyError. |
| Data Scientist (DS) | ~95% | Universally expected, with the framing varying by role. Notebook-DS roles expect Week 1 patterns inside notebooks; ML-engineer-DS roles expect the same patterns plus structured projects (Week 2). |
| Data Analyst (DA) | ~40% | Optional but rising. Postings increasingly mention Python for data prep. The Week 1 register (read a CSV, transform a list, log progress) is what they ask for; not the full pipeline patterns from Week 2+. |
The directional shape: every data role in NL expects Python comfort at Week 1's level. Roles differ in what they pile on top (dbt, pandas, Spark, ML libraries), but Week 1 is the floor that all four roles assume. If a posting says "Python skills required" with no further detail, this is the bar.
Week 1's tools are deliberately the standard-library defaults. NL postings generally do not name specific Python flavours; they name the concept and expect you to bring the obvious tool to it.
| Concept | Tool taught | Common NL alternatives | Practical implication | |
|---|---|---|---|---|
| Dependency isolation | venv + pip |
uv (rapidly winning, ~XX% of newer postings name it), poetry, conda (data-science contexts) |
venv + pip is the universal floor; postings that name a specific tool usually mean uv or poetry. The Week 1 muscle memory transfers directly because they all wrap the same pip install underneath. |
|
| Type hints | built-in int, str, list[T], `\ |
` unions | The same syntax everywhere; checkers vary: mypy (CLI), Pylance (VS Code), pyright (Microsoft, faster) | Postings rarely name a checker by tool. They say "uses type hints" or "writes type-annotated Python". Once you know the syntax, switching checkers is a one-line config change. |
| CLI tooling | argparse (stdlib) |
Typer, Click, plain sys.argv |
argparse is the universal default. Click and Typer show up in larger codebases (Airflow CLI, Apache projects). The argument-parsing mental model is the same; only the decorators change. |
|
| Logging | logging (stdlib) |
loguru (more ergonomic API), structlog (structured/JSON logs), cloud-native logging (Azure Monitor, Datadog) |
logging is the floor. Production NL postings increasingly mention structlog for JSON output that ELK / Azure Monitor can index, but the level model and idea of "configurable handlers" transfer 1:1. |
|
| File I/O | open() + pathlib.Path + csv / json modules |
pandas read_csv / to_json (Week 4), polars (faster pandas alternative), pyarrow (columnar) |
The stdlib pattern is the right place to start: pandas wraps the same modes and paths underneath. Once volumes grow past a few hundred MB, you switch to pandas/polars; the underlying file-handle and with-statement habits stay. |
|
| Cloud access | Azure Portal + Authenticator | AWS (largest share, ~XX% of NL DE postings), GCP (~XX%), Azure (~XX%) | Many postings list two clouds or "willing to learn". The IAM mental model (tenant → subscription → resource group) maps 1:1 across clouds; only the names differ. |
What this means for your CV: lead with "Python (type hints, CLI tooling, logging, file I/O)" as a single phrase rather than listing the standard library by module. Recruiters scan for the headline; interviewers want to see whether you can reach for the right tool when asked.
Postings phrase the expectation at three levels:
The chapter does not yet practice packaging (pyproject.toml, lockfiles), production-grade logging (handlers, structured output, log shipping), or dependency injection patterns. Those are the bridge from junior to medior and are introduced in later weeks. Week 1 is the foundation a hiring manager assumes when they invite a junior to a code interview.
Strong line a student can copy-adapt:
Built a CSV-cleaning script in Python: read a file with
pathlib.Pathand awithblock, transformed records using typed pure functions (def clean(record: dict[str, str]) -> dict[str, str]), wrote JSON output, usedargparseto take--inputand--outputpaths from the CLI, and logged per-record decisions via Python'sloggingmodule at INFO and DEBUG levels.
Recruiter keywords this carries: Python, type hints, pathlib, with-statement, argparse, CLI, logging, JSON, CSV.
Weaker alternative for contrast (avoid):
Wrote a Python script that processes CSV files.
The weaker version drops every recruiter keyword and could be claimed by anyone who has ever opened a CSV in a notebook. The strong version names the specific patterns (typed pure functions, argparse, logging) that signal the candidate has seen production-shaped Python.
Three sentences that cover the assignment cleanly when an interviewer asks "tell me about a project you have built":
argparse, reads the file with pathlib.Path and a with block, transforms each row through a typed function, and writes JSON output. The structure was deliberately small but production-shaped: I wanted every layer to be the kind I would actually ship."logging rather than print(): per-row decisions at DEBUG, summary counts at INFO. That choice paid off when one of the records was malformed: I could rerun with --verbose and see exactly which row tripped the check."Two honest follow-ups if asked "what would you do differently?":
dict for the row schema; in a real project I'd reach for a @dataclass (Week 2) or Pydantic model so the field types are declared once instead of being scattered across the transform functions. The hint coverage I have now is documentation; the dataclass version would be enforcement."logging module supports both via custom handlers; I just haven't wired one up yet."Week 1 is the foundation, not the finished article. After this week you are not yet:
These are the role-shaped skills the chapter does not yet make you qualified for. Naming them honestly in an interview ("I have the Python and the cloud-access piece; I have not yet built a scheduled pipeline") is more impressive than a junior overclaim.
uv, poetry, type checker share).westeurope).Mark this page indicative, not statistical. Numbers will be replaced with measured percentages once the postings-crawler project ships.
<aside> 💭 For generic NL junior data-career content (salary bands, day-to-day work, what employers do not expect from any junior), one shared page across all weeks is the right home. That page does not exist yet; for now, treat this page as Week-1-specific only.
</aside>
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with ❤️ by the HackYourFuture community · Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.