Introduction to Big Data and Streaming
Apache Spark Core Concepts
Databricks
Streaming Theory
Streaming Platforms: Kafka and Azure Event Hubs
Practice
Assignment
Gotchas & Pitfalls
Week 13 Lesson Plan (Teachers)
Databricks
Content coming soon...
Suggested Topics
- What is Databricks: a managed Spark platform with notebooks, clusters, and collaboration
- Databricks workspace: navigating the UI, creating notebooks, and managing clusters
- Running PySpark in a Databricks notebook: reading data, transformations, and writing results
- Optional/advanced: Unity Catalog — enterprise data governance; not needed for this course
- Delta Lake: brief mention of ACID transactions and versioned data on top of Spark
- Connecting Databricks to cloud storage (Azure Blob, S3)
- Databricks vs running Spark yourself: managed convenience vs infrastructure control
- Lab walkthrough: running a data transformation in a Databricks notebook
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0

*https://hackyourfuture.net/*
Found a mistake or have a suggestion? Let us know in the feedback form.