Azure Setup and Account Access
Week 1 Assignment: The Data Cleaning Pipeline
Going Further: Optional Deep Dives
Organizing code into functions and modules is essential for building maintainable data pipelines.
You learned functions in Core. Here's what's important for data engineering:
def clean_value(value: str, default: str = "") -> str:
"""Clean and normalize a string value.
Args:
value: The string to clean
default: Value to return if input is empty
Returns:
Cleaned string, lowercase and stripped
"""
if not value:
return default
return value.strip().lower()
<aside> ๐ก Always write docstrings. They help your future self and teammates.
</aside>
A module is simply a .py file. You can import functions from it.
# `utils.py`
def clean_value(value):
return value.strip().lower()
# `main.py`
from utils import clean_value
print(clean_value(" HELLO "))
__name__ == "__main__" PatternThis pattern lets a file work both as a module AND as a script:
# `utils.py`
def clean_value(value):
return value.strip().lower()
if __name__ == "__main__":
# Only runs when executed directly
print(clean_value(" TEST "))
<aside> โจ๏ธ Hands on: Create utils.py with a function, import it in main.py.
</aside>
<aside> ๐ Try it in the widget: https://lasse.be/simple-hyf-teach-widget/?week=1&chapter=functions_and_modules&exercise=modules_demo&lang=python
</aside>
if __name__ == "__main__": block allow a single file to do, and what is __name__ set to when the file is imported?utils.py with a function clean(), give two different ways to import and use it from main.py. When would you reach for each?def add_record(records=[]):) a footgun?Next lesson: Type Hints
The HackYourFuture curriculum is licensed underย CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with โค๏ธ by the HackYourFuture community ยท Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.