Week 6 - Cloud and Azure Essentials

Introduction to Cloud and Azure

Azure CLI and the Portal

Azure Blob Storage

Azure PostgreSQL Databases

Azure Container Apps Jobs

Cost Awareness

History of Cloud Computing

Week 6 Gotchas & Pitfalls

Practice

Week 6 Assignment: Deploy Your Pipeline to Azure

Week 6 Lesson Plan

Cost Awareness

As a data engineer, you often have the ability to create databases, virtual machines, and storage accounts. Each of those resources has a price, and costs add up quickly if you are not paying attention. Cost awareness is part of the job.

By the end of this chapter, you should understand how to estimate costs, choose the right resource size, and avoid unnecessary spending.

Concepts

Costs are part of the job

In this course, your Azure costs are covered by the shared tenant. In a real job, they are not. A pipeline that works but costs ten times more than it should is not a good pipeline.

Check pricing before you create

Azure publishes pricing pages for every service. Before you provision a Postgres server or a storage account, look up what the SKU costs per hour or per month.

The Azure pricing calculator lets you estimate costs before committing. Use it.

<aside> 🖼️ Interactive: Week 6 Cost Calculator

</aside>

Try adjusting the sliders to see how different choices affect your monthly bill.

<aside> ⌨️ Hands on: Open the cost calculator above and experiment with different settings. Then open the official Azure pricing calculator and look up the monthly cost of a Standard_B1ms Postgres Flexible Server in West Europe. Compare the numbers.

</aside>

What your Week 6 setup actually costs

Here is a realistic estimate for the resources you use in this track:

Resource SKU / tier Monthly cost (West Europe)
PostgreSQL Flexible Server Standard_B1ms (1 vCore, 2 GB RAM) ~€13
Blob Storage (LRS) 10 GB stored + light usage ~€0.02
Container Registry (Basic) Shared across class ~€5 (shared)
Container App Job 1 run/day, 60 seconds each Free (within 180,000 vCPU-sec/month free tier)
Total ~€13/month

Almost all the cost comes from the Postgres server, because it runs 24/7 whether your pipeline uses it or not. Blob storage and Container App Jobs are nearly free at this scale.

Which resources bill when idle

Not all Azure resources cost money the same way. Some bill continuously, others only when used:

Resource Bills when idle? Why
PostgreSQL Flexible Server Yes: runs 24/7 It is a running VM with reserved compute
Blob Storage Yes: per GB stored You pay for data at rest, even if nobody reads it
Container App Job No: only during execution No compute allocated between runs
Container Registry Yes: fixed monthly fee The registry exists whether you push images or not
Container Apps Environment No: consumption plan No charge when no apps or jobs are running

The key takeaway: Postgres is the expensive one. Your teacher manages the shared server, but in a real project, stopping or scaling down the database when it is not needed is the single biggest cost saver.

Choose the smallest SKU that works

A Standard_B1ms Postgres server costs a fraction of a Standard_D4s_v3. Start small and scale up only when you have evidence you need more.

The same applies to storage: do not pick geo-redundant replication for a dev environment where locally redundant storage (LRS) is enough.

Stop what you are not using

A database that runs 24/7 for a pipeline that runs once a day is wasting money. Many Azure services can be stopped without deleting them. When stopped, you keep your data and configuration but stop paying for compute.

For PostgreSQL Flexible Server, you can stop and start it from the CLI:

# Stop the server (no compute charges while stopped)
az postgres flexible-server stop \\
  --resource-group rg-weather-dev \\
  --name hyf-data-pg

# Start it again when you need it
az postgres flexible-server start \\
  --resource-group rg-weather-dev \\
  --name hyf-data-pg

A stopped Postgres server still stores your data (you pay for storage), but the compute cost drops to zero. For a Standard_B1ms server, that saves ~€0.018/hour, which adds up to ~€9.50/month if you only run it 8 hours a day instead of 24.

<aside> ⚠️ Azure automatically restarts a stopped Flexible Server after 7 days if you do not start it yourself. Set a reminder or script the stop/start cycle.

</aside>

The same principle applies to other compute resources: VMs, App Service plans, and managed databases all have a stop option. Storage (blobs, disks) always bills for data at rest, but stopping the compute layer is where the real savings are.

When to stop vs delete:

Action Use when Data preserved?
Stop You need the resource again soon (e.g. next class) Yes
Delete You are done with it permanently No

For this course, your teacher manages the shared server. But in a real project, scheduling stop/start around your pipeline's actual usage hours is the single easiest cost optimization.

Use tags to track spending

Tags like team=data and project=weather let you filter cost reports and see where money is going.

# Add tags to a resource group
az group update --name rg-weather-dev \\
  --tags team=data project=weather env=dev

Add tags early, not at the end.

Set budget alerts

Azure lets you create budget alerts that notify you when spending crosses a threshold. This catches runaway costs before they become a problem.

<aside> 💡 Using AI to help: Paste your az resource creation commands into an LLM and ask "What will this cost per month in West Europe?" It can estimate based on published pricing. Always verify against the official pricing calculator. (⚠️ Ensure no PII or sensitive company data is included!)

</aside>

The cost conversation in teams

In professional settings, cost decisions are often shared:

Knowing what things cost and proactively optimizing is a signal of professional maturity.

<aside> 🤓 Curious Geek: Cloud cost horror stories

Cloud cost surprises are more common than you think. Troy Hunt (creator of Have I Been Pwned) documented how his cloud costs spiralled due to unexpected bandwidth charges on a service he assumed was cheap. A misconfigured auto-scaling rule or a forgotten GPU cluster can generate massive bills overnight. Companies like Netflix and Spotify employ dedicated "FinOps" teams to manage cloud spending. The lesson: always check pricing, always set alerts.

</aside>

Exercises

  1. Use the Azure pricing calculator to estimate the monthly cost of your Week 6 setup: one Standard_B1ms Postgres server + 10 GB blob storage + a Container App Job running once daily for 60 seconds.
  2. Add team and env tags to your resource group using the CLI.
  3. Find the pricing page for Azure Container Apps and explain how job billing works.

🧠 Knowledge Check

  1. Your Postgres server costs ~€13/month. Your pipeline only runs for 30 seconds a day. What can you do to reduce the Postgres cost without deleting the server?
  2. What is the difference between stopping and deleting an Azure resource? When would you choose one over the other?
  3. A teammate creates a Container App Job with --trigger-type Schedule and a cron expression that runs every minute. They forget about it over the weekend. What happens to the bill, and how could budget alerts have helped?

Extra reading


The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.