Introduction to SQL for Analytics
2. Joins, CTEs, and Aggregations
4. OLAP vs OLTP and Modern Warehouses
5. Building Views in Azure PostgreSQL
Welcome back đ After handing in your mid-track project, navigating the complexities of Docker and automating your first CI/CD pipelines, itâs now time to actually make sense of the data youâve been moving!
In this chapter, we transition from building data pipelines to working directly with data in a structured and analytical way. While Python is commonly used to orchestrate and process data workflows, SQL is the primary language for querying and transforming data at scale within modern data systems.
This shift moves the focus from how data is moved and prepared, to how it is explored, shaped, and turned into insights, which is the ultimate goal!
In the tech world, things move fast, sometimes too fast. Tools like Airflow might get replaced in a decade. But SQL? SQL was invented in the early 1970s and it is still the undisputed champion of the data world.
đĄ
Fun fact: SQL was originally developed at IBM and was first called SEQUEL (Structured English Query Language) because it was designed to read almost like English. Due to trademark issues, the name was shortened to SQL, but many people still pronounce it âsequelâ today.
In a world where everyone uses different programming languages, everyone still speaks SQL! It is really the universal language that connects Data Engineers, Analysts, and Business Stakeholders.
In this section, we explore the distinction between operational SQL, which focuses on day-to-day system behavior, and analytical SQL, which is designed for aggregating, transforming, and analyzing data to generate insights.
We can summarise the main differences between OLTP and OLAP as:
Youâve just spent weeks getting good at Python, so itâs natural you might think, "I'll just pd.read_sql() everything into a DataFrame and do my work there."
Altough tempting, this is a quick path to performance issues, increased costs and operational inefficiencies. Databases are highly optimised systems designed to do exactly what SQL is asking them to do: filter, join, aggregate, and transform data directly where it lives. When you pull large datasets into Python unnecessarily, you:
<aside> đĄ The Principle of Data Locality: Moving data is expensive. If your dataset is stored in an Azure PostgreSQL instance and you want to compute the average fare, SQL performs this computation inside the database and returns a single aggregated result. Python, on the other hand, can easily end up pulling large portions of the dataset into memory first.
</aside>
Choosing between SQL and Python depends on where and how the transformation is best executed besides which transformation you are aiming at.
Use SQL when:
Use Python when:
A useful rule of thumb:
If the data is already in a database and the operation is relational, use SQL. Otherwise, use Python.
Letâs now zoom out a bit to understand SQLâs role: in a modern data pipeline, SQL is most often used in the transformation and reporting layers, sitting between raw data ingestion and business-facing outputs.
As youâve seen in previous chapters, a typical flow looks like this:
This is where SQL plays a central role. Data is cleaned, joined, reshaped, and aggregated into meaningful structures. This may include:
Finally, transformed datasets are used to power dashboards, reports, or downstream analytics tools. This layer focuses on readability, performance, and business relevance.
SQL is particularly powerful in the transformation stage because it allows these operations to happen directly inside the database, close to the data.
A common workflow is:
SQL prepares clean datasets â Python processes or models them â BI tools visualise the results.
When the modeling steps are not needed, BI tools like Tableau or PowerBI connect directly to the output of your SQL datasets. Most BI tools also allow to connect to your SQL database directly and write custom SQL queries within the BI Tool interface itself!
The goal of the analytical exploration can be often grouped into a small number of recurring patterns. Learning to recognise them helps you translate business questions into SQL more effectively:

In the next chapters, weâll look into how to answer such questions using SQL.
Although SQL is a standard language, different database systems implement their own dialects. These dialects share a common core but differ in functions, syntax details, and performance features.
Today, most companies use cloud-based set-ups such as Azue, Google BigQuery or Snowflake. Despite the differences each SQL type has, most core concepts and functions remain transferable.
In this course we use PostgreSQL, which is considered one of the most standards-aligned SQL dialects.
The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

Built with â¤ď¸ by the HackYourFuture community ¡ Thank you, contributors
Found a mistake or have a suggestion? Let us know in the feedback form.