Week 6 - Cloud and Azure Essentials

Introduction to Cloud and Azure

Azure CLI and the Portal

Azure Blob Storage

Azure PostgreSQL Databases

Azure Container Apps Jobs

Cost Awareness

Practice

Assignment: Deploy to Azure

Gotchas & Pitfalls

Slides (PDF)

Career relevance: Week 6

Glossary: Week 6

Going Further

History of Cloud Computing

Cost Awareness

As a data engineer, you often have the ability to create databases, virtual machines, and storage accounts. Each of those resources has a price, and costs add up quickly if you are not paying attention. Cost awareness is part of the job.

By the end of this chapter, you should understand how to estimate costs, choose the right resource size, and avoid unnecessary spending.

Concepts

Costs are part of the job

In this course, your Azure costs are covered by the shared tenant. In a real job, they are not. A pipeline that works but costs ten times more than it should is not a good pipeline.

Check pricing before you create

Azure publishes pricing pages for every service. Before you provision a Postgres server or a storage account, look up what the SKU costs per hour or per month.

The Azure pricing calculator lets you estimate costs before committing. Use it.

<aside> 🖼️ Interactive: Week 6 Cost Calculator

</aside>

Try adjusting the sliders to see how different choices affect your monthly bill.

<aside> ⌨️ Hands on: Open the cost calculator above and experiment with different settings. Then open the official Azure pricing calculator and look up the monthly cost of a Standard_B1ms Postgres Flexible Server in West Europe. Compare the numbers.

</aside>

What your Week 6 setup actually costs

Here is a realistic estimate for the resources you use in this track:

Resource SKU / tier Monthly cost (West Europe)
PostgreSQL Flexible Server Standard*B1ms (1 vCore, 2 GB RAM), see SKU ~€13
Blob Storage (LRS) 10 GB stored + light usage ~€0.02
Container Registry (Basic) Shared across class ~€5 (shared)
Container App Job 1 run/day, 60 seconds each Free (within 180,000 vCPU-sec/month free tier)
Total ~€13/month

Almost all the cost comes from the Postgres server, because it runs 24/7 whether your pipeline uses it or not. Blob storage and Container App Jobs are nearly free at this scale because they are serverless: you pay only for the seconds the job actually runs.

Which resources bill when idle

Not all Azure resources cost money the same way. Some bill continuously, others only when used:

Resource Bills when idle? Why
PostgreSQL Flexible Server Yes: runs 24/7 It is a running VM with reserved compute
Blob Storage Yes: per GB stored You pay for data at rest, even if nobody reads it
Container App Job No: only during execution No compute allocated between runs
Container Registry Yes: fixed monthly fee The registry exists whether you push images or not
Container Apps Environment No: consumption plan No charge when no apps or jobs are running

The key takeaway: Postgres is the expensive one. Your teacher manages the shared server, but in a real project, stopping or scaling down the database when it is not needed is the single biggest cost saver.

Choose the smallest SKU that works

A Standard_B1ms Postgres server costs a fraction of a Standard_D4s_v3. Start small and scale up only when you have evidence you need more.

The same applies to storage: do not pick geo-redundant replication for a dev environment where locally redundant storage (LRS) is enough.

Stop what you are not using

A database that runs 24/7 for a pipeline that runs once a day is wasting money. Many Azure services can be stopped without deleting them. When stopped, you keep your data and configuration but stop paying for compute.

For PostgreSQL Flexible Server, the Azure CLI can stop and start the server:

<aside> 💭 Teacher-only: runs once around class time. Students do not have permission to start or stop the shared Postgres server; copy-pasting these commands returns AuthorizationFailed. Shown so you know how the cost-saving cycle works in practice. Your teacher decides when to stop the server (typically after class) and start it again (before the next class).

</aside>

# Stop the server (no compute charges while stopped)
az postgres flexible-server stop \
  --resource-group rg-hyf-data \
  --name hyf-data-pg

# Start it again when you need it
az postgres flexible-server start \
  --resource-group rg-hyf-data \
  --name hyf-data-pg

A stopped Postgres server still stores your data (you pay for storage), but the compute cost drops to zero. For a Standard_B1ms server, that saves ~€0.018/hour, which adds up to ~€9.50/month if you only run it 8 hours a day instead of 24.

<aside> ⚠️ Azure automatically restarts a stopped Flexible Server after 7 days if you do not start it yourself. Set a reminder or script the stop/start cycle.

</aside>

The same principle applies to other compute resources: VMs, App Service plans, and managed databases all have a stop option. Storage (blobs, disks) always bills for data at rest, but stopping the compute layer is where the real savings are.

When to stop vs delete:

Action Use when Data preserved?
Stop You need the resource again soon (e.g. next class) Yes
Delete You are done with it permanently No

For this course, your teacher manages the shared server. But in a real project, scheduling stop/start around your pipeline's actual usage hours is the single easiest cost optimization.

Working out the savings is straightforward arithmetic that the widget exercise below lets you practise.

<aside> 🚀 Try it in the widget: Monthly cost in EUR exercise

</aside>

Use tags to track spending

Tags like team=data and project=weather attach to a resource group or individual resource and let you filter cost reports to see where money is going.

<aside> 💭 Teacher-only: tagging the shared resource group. Students do not have write permission on the resource group; copy-pasting az group update returns AuthorizationFailed. Shown so you can see how a real team tags resources for cost attribution. In a job you would own this on your own project's RG.

</aside>

# Add tags to a resource group
az group update --name rg-hyf-data \
  --tags team=data project=weather env=dev

Add tags early, not at the end.

Viewing costs in the portal

Your teacher has granted your group access to view the live costs of your resources. You can inspect the spending in real time:

  1. Open the Azure Portal.
  2. Search for and select the class resource group: rg-hyf-data.
  3. In the left-hand navigation menu, under Cost Management, click on Cost Analysis.
  4. You will see a breakdown of the accumulated costs of your database, storage, and Container App Jobs.

<aside> ⌨️ Hands on: Navigate to the Cost Analysis page for rg-hyf-data in the Azure Portal and check the accumulated costs for this week.

</aside>

Set budget alerts

Azure lets you create budget alerts that notify you when spending crosses a threshold. This catches runaway costs before they become a problem.

<aside> 💡 Using AI to help: Paste your az resource creation commands into an LLM and ask "What will this cost per month in West Europe?" It can estimate based on published pricing. Always verify against the official pricing calculator. (⚠️ Ensure no PII or sensitive company data is included!)

</aside>

The cost conversation in teams

In professional settings, cost decisions are often shared:

Knowing what things cost and proactively optimizing is a signal of professional maturity.

<aside> 🤓 Curious Geek: Cloud cost horror stories

Cloud cost surprises are more common than you think. Troy Hunt (creator of Have I Been Pwned) documented how his cloud costs spiralled due to unexpected bandwidth charges on a service he assumed was cheap. A misconfigured auto-scaling rule or a forgotten GPU cluster can generate massive bills overnight. Companies like Netflix and Spotify employ dedicated "FinOps" teams to manage cloud spending. The lesson: always check pricing, always set alerts.

</aside>

Knowledge Check

<aside> 🚀 Try it in the widget: Interactive Quiz: Cost Awareness

</aside>

https://lasse.be/simple-hyf-teach-widget/mcq.html?bank=week_6_ch6_cost_awareness&embed=1

If estimating Azure costs and reading the Cost Management blade felt unclear, this AZ-900 episode walks through Cost Management from scratch.

<aside> 🎬 Struggling with this concept? Watch this beginner-friendly video:

</aside>

https://www.youtube.com/watch?v=7w88KBVesPI

Extra reading

Ready to apply the concepts? Read the live costs your group can already see in the portal.

<aside> ⌨️ Hands on: Practice with Exercise 5: Cost Analysis in the Azure Portal. Run bash exercise.sh in the Learning-Resources repo, open Cost Analysis for rg-hyf-data, and fill in cost_findings.md.

</aside>


Next up: Practice, where you combine the week's skills: trace the resource group, verify a blob end to end, write to Postgres, create a container job, and read costs in the portal.