The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with ❤️ by the HackYourFuture community · Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.
OLAP vs OLTP and Modern Warehouses
Introduction to Big Data and Streaming
Streaming Platforms: Kafka and Azure Event Hubs
Week 13 Lesson Plan (Teachers)
<aside> 🚧 Planned restructure (April 2026). The five-chapter scaffold below (Spark → Databricks → streaming theory → Kafka) reads as a tool tour rather than a coherent skill set. Tracking issues reshape this into a single-substrate week teaching analytics engineering at platform scale on Databricks:
Kafka moves to week_13__going_further.md with pointers to Confluent's tutorials. Spark-standalone is absorbed into the PySpark chapter (Spark is what runs inside a Databricks notebook: no reason to teach them as separate topics).
See issue #112 (the dbt-on-Databricks chapter specifically) and issue #113 (the full-week restructure) for rationale, open decisions, and phasing. Do not re-scaffold the old tool-tour shape: any new work on Week 13 should land the chapter structure above.
</aside>
Welcome to Week 13! So far you have worked with datasets that fit comfortably on a single machine. This week introduces two areas where traditional approaches break down: big data processing with Apache Spark and streaming data with platforms like Kafka and Azure Event Hubs. These are advanced topics that expand your toolkit for handling scale and real-time data.
By the end of this week, you will understand how distributed computing works, run transformations in a Databricks notebook, and explore streaming concepts through Kafka theory and hands-on practice with Azure Event Hubs.