Introduction to Big Data and Streaming
Apache Spark Core Concepts
Databricks
Streaming Theory
Streaming Platforms: Kafka and Azure Event Hubs
Practice
Assignment
Gotchas & Pitfalls
Week 13 Lesson Plan (Teachers)
Introduction to Big Data and Streaming
Content coming soon...
Suggested Topics
- What is big data: the point where single-machine tools (pandas, SQLite) stop working
- The 3 Vs: volume, velocity, variety and why they matter for tool selection
- Batch vs streaming: processing data at rest vs data in motion
- When you need distributed computing vs when you do not
- Overview of the big data ecosystem: Spark, Hadoop, Flink, Kafka, cloud-native services
- The modern data stack and where big data and streaming fit in
- Real-world examples: log processing, clickstream analytics, IoT sensor data
- Cost and complexity trade-offs: not every problem needs a distributed solution
- Cost awareness: Databricks clusters cost money per hour. Free trial limits. Always terminate clusters when not in use.
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

*https://hackyourfuture.net/*
Found a mistake or have a suggestion? Let us know in the feedback form.