Azure Setup and Account Access
In this assignment, you will build a robust command-line tool to clean a "messy" dataset. This mimics a very common real-world task for data engineers: taking raw, inconsistent data and transforming it into a clean, usable format.
You have been given a file data/messy_users.csv. It contains user data, but it is full of errors: whitespace issues, inconsistent capitalization, missing fields, and badly formatted numbers.
Instead of one big script, you will build a modular pipeline:
src/utils.py: Create functions for cleaning individual fields (e.g., clean_salary, clean_name).src/cleaner.py: The main entry point that uses functions from utils.py to process the entire file." or , (e.g. "68,000" should become integer 68000).name is empty (after cleaning), skip the row and log a warning.email is empty, skip the row and log a warning.utils.py should only contains functions. Your cleaner.py must import these functions.if __name__ == "__main__": in cleaner.py.data/file.csv. Use the pathlib module to build paths that work on all OSs.logging module (not print) to report progress. Add INFO logs for status and WARNING logs for skips.try/except blocks to handle file not found errors gracefully.We want you to practice using AI as a tool for debugging, not just generating code.
AI_DEBUG.md and document:Data engineering often happens in the cloud. We need to verify you are ready for the upcoming cloud modules.
assets/azure_proof.png. week1-assignment/
├── src/
│ ├── cleaner.py
│ └── utils.py
├── data/
│ └── messy_users.csv
├── output/
│ └── (clean_users.json will be generated here)
├── assets/
│ └── azure_proof.png
├── AI_DEBUG.md
└── README.md
week1/your-name.The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

Found a mistake or have a suggestion? Let us know in the feedback form.