Week 13 - Big Data & Streaming

Introduction to Big Data and Streaming

Apache Spark Core Concepts

Streaming Theory

Streaming Platforms: Kafka and Azure Event Hubs

Gotchas & Pitfalls

Week 13 Lesson Plan (Teachers)

Introduction to Big Data and Streaming

Content coming soon...

Suggested Topics

What is big data: the point where single-machine tools (pandas, SQLite) stop working
The 3 Vs: volume, velocity, variety and why they matter for tool selection
Batch vs streaming: processing data at rest vs data in motion
When you need distributed computing vs when you do not
Overview of the big data ecosystem: Spark, Hadoop, Flink, Kafka, cloud-native services
The modern data stack and where big data and streaming fit in
Real-world examples: log processing, clickstream analytics, IoT sensor data
Cost and complexity trade-offs: not every problem needs a distributed solution
Cost awareness: Databricks clusters cost money per hour. Free trial limits. Always terminate clusters when not in use.

The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.