Week 1 -Foundational Python

Python Setup

Data Types and Variables

Control Flow

Functions and Modules

Type Hinting

CLI Habits

Errors and Debugging

Logging in Python

File Operations

[Cloud] Azure Setup

Gotchas & Pitfalls

Practice

Assignment

Back to Track

🎒 Week 1 Assignment: The Data Cleaning Pipeline

In this assignment, you will build a robust command-line tool to clean a "messy" dataset. This mimics a very common real-world task for data engineers: taking raw, inconsistent data and transforming it into a clean, usable format.

Task 1 – The Cleaner Script

week_1__messy_users.csv

You have been given a file data/messy_users.csv. It contains user data, but it is full of errors: whitespace issues, inconsistent capitalization, missing fields, and badly formatted numbers.

Create a Python script named src/cleaner.py that reads this CSV file, cleans the data according to the rules below, and writes the valid records to a JSON file.

Before writing any code, think about the structure of the clean data you want to produce. In real data engineering work, cleaning is usually done to conform to a target shape expected by downstream systems: with this in mind, your implementation should reflect a clear and consistent data structure.

Cleaning Rules

  1. Name: Remove any leading/trailing whitespace.
  2. Email: Convert to lowercase.
  3. Department: If missing, set to "Unknown".
  4. Salary:
  5. Validation:

Technical Requirements

Task 2 – AI Debugging Report

We want you to practice using AI as a tool for debugging, not just generating code.

  1. Introduce a bug into your code intentionally (or use a real one you encountered).

    Examples: Salary parsed as string instead of int, Logging counts wrong, JSON output malformed, Rows skipped incorrectly, ...

  2. Ask an LLM (ChatGPT, Claude, etc.) to help you fix it.

  3. Create a file AI_DEBUG.md and document:

Task 3 – Azure Setup

Data engineering often happens in the cloud. We need to verify you are ready for the upcoming cloud modules.

  1. Log into portal.azure.com.
  2. Take a screenshot of the portal dashboard showing your account logged in.
  3. Save the image as assets/azure_proof.png.

Submission

  1. Ensure your project structure looks like this:

    week1-assignment/
    ├── src/
    │   └── cleaner.py
    ├── data/
    │   └── messy_users.csv
    ├── output/
    │   └── (clean_users.json will be generated here)
    ├── assets/
    │   └── azure_proof.png
    ├── AI_DEBUG.md
    └── README.md
    

    README.md should include: how to run the script, example commands, assumptions made during cleaning if any, etc.

  2. Create a git branch week1/your-name.

  3. Commit your changes.

  4. Push to the repository and open a Pull Request.


CC BY-NC-SA 4.0 Icons

*https://hackyourfuture.net/*

Found a mistake or have a suggestion? Let us know in the feedback form.