Introduction to Big Data and Streaming
Apache Spark Core Concepts
Databricks
Streaming Theory
Streaming Platforms: Kafka and Azure Event Hubs
Practice
Assignment
Gotchas & Pitfalls
Week 13 Lesson Plan (Teachers)
Content
Practice
Content coming soon...
Suggested Activities
- Running a PySpark transformation in a Databricks notebook on a sample dataset
- Comparing pandas and PySpark performance on the same transformation at different data sizes
- Explore Azure Event Hubs: create an Event Hub and send a test message
- Consume messages from Azure Event Hubs using a Python script
- Exploring the Databricks UI: cluster management, notebook execution, and data browsing
- Compare batch vs streaming: write up the trade-offs for a given use case
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

*https://hackyourfuture.net/*
Found a mistake or have a suggestion? Let us know in the feedback form.