Teachers

Writing to databases

In the previous chapters you’ve learned a lot about various ways of reading data from databases. However, you’ll find yourself needing to write databases frequently too. Writing to databases might feel a little bit more “scary”; if you’re working on a database with actual data someone is using (whether that’s in production, testing, development, or wherever), you know you have to be careful. You don’t want to overwrite the wrong data, or accidentally delete it, or insert something bad.

Luckily, Database Management Systems (DBMS) have given us tools to prevent and recover from such mistakes. To understand how, we first need to understand transactions.

What is a transaction?

Most of the SQL you have written so far has been single statements: one INSERT, one UPDATE, one SELECT. Each statement executes and the database moves on.

Real applications rarely work that way. Almost every meaningful operation involves multiple statements that belong together. Some examples:

Banking system: Transferring money between two bank accounts requires a debit on one and a credit on the other.
E-commerce order placement. When a customer places an order, the application needs to insert a row into an orders table, insert one row per item into order_items, and decrement the stock count for each product in an inventory table.
Postify streaming app: Adding a new album with new tracks requires inserting into albums and into tracks .

A transaction is a way of telling the database: treat all of these statements as a single unit of work. Either all of them succeed together, or none of them do. The database guarantees this, no matter what goes wrong in between.

You can think of a transaction as a bracket around a group of statements:

BEGIN   -- Begins the transaction
  statement 1
  statement 2
  statement 3
COMMIT  -- everything above (the transaction) is saved permanently

The statements in the transaction are saved permanently by COMMITing the transaction. This is an action that DBMS’s often do for you implicitly (called autocommit), but you can do manually as well.

<aside> 💡

If you’ve written insert or update statements before, either directly in SQL or using Java, autocommit was probably turned on already, and you have been working with transactions without even knowing it! 🙃

</aside>

If things go wrong and you don’t want to save your transaction, you can perform a rollback:

BEGIN
  statement 1
  statement 2
  ← error occurs here
ROLLBACK  ← everything above is undone, as if it never happened

The problem: partial updates and data inconsistency

Let’s consider an example to show how transactions, commits, and rollbacks can help. Say we add subscriptions to the Postify app, where users can subscribe with ‘free’ and ‘premium’ tiers. A prescription has a start- and end timestamp, so that can track historically how a user subscribed to the platform. Here’s the SQL for it:

add-subscriptions.sql

Let’s see what would should happen when a user (user_id=3) gets a premium subscription:

-- Step 1
UPDATE subscriptions
SET    ended_at = NOW()
WHERE  user_id = 3
AND    ended_at IS NULL; -- End the previous 'free' subscription

-- Step 2
INSERT INTO subscriptions (user_id, plan_tier, started_at)
VALUES (3, 'premium', NOW()); -- Start a premium subscription.

Now imagine the application executes step 1 successfully, but then something goes wrong before step 2: the server crashes, the database connection drops, a bug throws an exception. The result is a user with no active subscription at all. Their free plan has been ended, but the premium one was never created. From the database's perspective the user simply has no plan. This is inconsistent with reality and will likely cause errors or wrong behaviour everywhere that subscription tier is checked.

This class of problem is called a partial update; some statements in a logical operation succeeded and others did not, leaving the database in an intermediate state that was never supposed to exist.

Partial updates can cause:

Data corruption; rows in a state that violates the intended rules of your system, even if no database constraint is technically broken
Silent wrong behaviour; the application continues running and reads the corrupted state as if it were valid
Hard-to-reproduce bugs; the failure only happens under specific timing or load conditions, making it nearly impossible to debug after the fact

The problem gets worse as systems grow. A single HTTP request in a real backend might touch five or six tables. Without transactions, any of those writes failing mid-way leaves a trail of inconsistent data that is difficult or impossible to clean up automatically.

Transactions solve this by making the entire group of statements atomic (indivisible, if you will). The next section explains exactly what that means.

ACID 🧪

Transactions are governed by four guarantees that database systems have provided since the 1970s. These guarantees are known collectively as ACID.

Atomicity; A transaction is atomic: it either completes in full, or it has no effect at all. There is no such thing as a half-committed transaction. If any statement inside the transaction fails, the database automatically undoes everything that came before it in that same transaction.
Consistency; A transaction can only bring the database from one valid state to another valid state. Valid means all constraints, foreign keys, and rules you have defined on the schema are satisfied. A transaction that would leave the database in an invalid state. Like inserting a streams row with a track_id that references a non-existent track. Such a statement, and all the other ones in the same transaction, are rejected entirely. ****
Isolation; In a real application, hundreds of transactions may be running at the same time. Isolation means each transaction executes as if it were the only one running. The intermediate state of one transaction, with writes it has made but not yet committed, is not visible to other transactions.
Durability; Once a transaction is committed, it stays committed. A server crash, a power failure, a restart — none of these can undo a committed transaction. The database achieves this by writing to a persistent transaction log before confirming the commit to the application.

Working with transactions

Let’s show an example using psql. Previously we created the streams table like so:

CREATE TABLE streams (
    stream_id   SERIAL      PRIMARY KEY,
    user_id     INT         NOT NULL,
    track_id    INT         NOT NULL,
    streamed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    FOREIGN KEY (user_id)  REFERENCES users  (user_id)  ON DELETE CASCADE,
    FOREIGN KEY (track_id) REFERENCES tracks (track_id) ON DELETE CASCADE
);

COMMIT: saving you work

Adding a stream event and making it permanent:

BEGIN;

INSERT INTO streams (user_id, track_id)
VALUES (3, 42);

COMMIT;

After the COMMIT, the row is permanent and visible to all other connections.

ROLLBACK: undoing your work

The most important use of ROLLBACK is recovering from a mistake. Consider accidentally deleting all rows from streams:

BEGIN;

DELETE FROM streams;
-- 502 rows deleted. Your stomach drops - you forgot the WHERE clause again 😱.

ROLLBACK;
-- All 502 rows are back. Crisis averted 😮‍💨.

This doesn’t just work for deletes; you can do updates too.

BEGIN;

UPDATE streams
SET    track_id = 1;
-- Oops - forgot the WHERE clause as always, every row now points to track 1

SELECT DISTINCT track_id FROM streams;
-- Returns only: 1

ROLLBACK;

SELECT DISTINCT track_id FROM streams;
-- All original track IDs are back

Rollbacks can help you recover from software errors (dropped connections, bugs) and user errors (like forgetting the WHERE clause). Additional tip for preventing user errors: start your UPDATE and DELETE statements with the WHERE clause first!

The HackYourFuture curriculum is licensed under CC BY-NC-SA 4.0 *https://hackyourfuture.net/*

CC BY-NC-SA 4.0 Icons

Built with ❤️ by the HackYourFuture community · Thank you, contributors

Found a mistake or have a suggestion? Let us know in the feedback form.

Week 6