Introduction to Big Data and Streaming
Apache Spark Core Concepts
Databricks
Streaming Theory
Streaming Platforms: Kafka and Azure Event Hubs
Practice
Assignment
Gotchas & Pitfalls
Week 13 Lesson Plan (Teachers)
Streaming Theory
Content coming soon...
Suggested Topics
- What is data streaming: processing events as they arrive rather than in batches
- Pub/sub pattern: publishers send messages to topics, subscribers consume them
- Message queues vs event streams: different guarantees and use cases
- Brokers: the middleware that stores and delivers messages (Kafka, RabbitMQ, Azure Event Hubs)
- Key concepts: topics, partitions, offsets, consumer groups, and delivery guarantees
- At-most-once and at-least-once delivery semantics (Optional/advanced: exactly-once delivery semantics)
- Windowing: aggregating streaming data over time intervals
- Azure Event Hubs as a managed pub/sub service: how it maps to Kafka concepts (Event Hub = topic, consumer group = consumer group, partition = partition)
- When to use streaming vs batch: latency requirements, data volume, and complexity trade-offs
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

*https://hackyourfuture.net/*
Found a mistake or have a suggestion? Let us know in the feedback form.