Part 2 of Database Internals

ACID, Read Phenomena, and Isolation Levels: What to Use When

18 min read

Most concurrency bugs in production aren’t application bugs — they’re isolation bugs. Two requests arrive at the same instant, both read the same row, both update it, and one update silently disappears. Or a customer is shown a price that’s stale by 50 milliseconds and gets charged a different amount. Or a hotel room gets double-booked because two transactions saw “one room available” at the same time.

The fix isn’t more retries or better logging. It’s understanding what your database actually guarantees, what it doesn’t, and what knobs let you trade one for the other. This post walks through ACID, the five concurrency anomalies you’ll meet in production, the four SQL isolation levels and what each one costs you, how Postgres implements them, and a decision tree for picking the right level — without paying for SERIALIZABLE everywhere.


The Vocabulary

Before any code, the words. Most isolation-level confusion comes from these terms being used loosely.

Transaction — a unit of work grouping one or more operations into a single logical action. Either all succeed (COMMIT) or none do (ROLLBACK). Started with BEGIN.

ACID — the four guarantees a “safe” transaction provides: Atomicity (all-or-nothing), Consistency (constraints hold), Isolation (no interference with concurrent transactions), Durability (committed data survives crashes).

Read phenomenon — a specific kind of anomaly that becomes possible when isolation is relaxed. Dirty reads, non-repeatable reads, phantom reads, lost updates, write skew — each one a real bug pattern, named after the SQL standard’s classification.

Isolation level — the dial controlling which phenomena your transactions are allowed to observe. Higher level = fewer anomalies = more contention. Four standard levels: Read Uncommitted, Read Committed, Repeatable Read, Serializable.

MVCC (Multi-Version Concurrency Control) — the mechanism that lets readers and writers not block each other. Every row keeps multiple versions; each transaction sees the version that was committed when it started. Used by Postgres, MySQL InnoDB, Oracle.

Snapshot — a point-in-time view of the database visible only to one transaction. Under MVCC, every transaction operates against its own snapshot. Reads inside the transaction stay consistent even as other transactions commit.

Lock — a database-managed token a transaction holds on a resource (row, range, table) to control concurrent access. Shared (read) locks allow other readers but block writers. Exclusive (write) locks block everyone else. Predicate locks cover a query condition (WHERE status = 'available'), not just specific rows.

SSI (Serializable Snapshot Isolation) — Postgres’s implementation of SERIALIZABLE. Tracks read-write dependency cycles between concurrent transactions and aborts the one that would cause an anomaly, instead of blocking them with predicate locks. Cheaper than classical serializable, but you must handle serialization-failure errors and retry.

Pessimistic vs optimistic concurrency control — pessimistic locks resources up front (SELECT … FOR UPDATE). Optimistic doesn’t lock, but checks at commit time whether the data you read has been modified by someone else, and retries if so. Pessimistic is safer under contention; optimistic is faster when contention is rare.

WAL (Write-Ahead Log) — every change is appended to a log file before the in-memory page is touched. Durability comes from fsync’ing the WAL before COMMIT returns success. Recovery replays the WAL from the last checkpoint.


ACID, Briefly

The four properties, in one paragraph each.

Atomicity — all-or-nothing

Every operation inside a transaction is committed together, or the entire transaction is rolled back on failure. Alice transfers $500 to Bob:

BEGIN;
  UPDATE accounts SET balance = balance - 500 WHERE id = 'alice';
  UPDATE accounts SET balance = balance + 500 WHERE id = 'bob';
COMMIT;

If the server crashes after the first UPDATE but before the second, atomicity ensures the rollback on recovery restores Alice’s balance. Money never disappears mid-transfer.

Consistency — constraints hold

Every transaction must leave the database satisfying all declared rules — foreign keys, unique constraints, CHECK constraints, triggers. If Alice has $200 and a debit of $500 would violate CHECK (balance >= 0), the entire transaction is rejected. No partial state, no application bypass.

Isolation — concurrent transactions don’t interfere

This is the most nuanced of the four. Full isolation is expensive, so databases let you tune it via isolation levels. The rest of this post is mostly about this property.

Durability — committed data survives crashes

Once COMMIT returns success, the change is persisted to disk via the WAL, even if the power cuts out a millisecond later. This is why commits have an I/O cost — durability isn’t free.


The Five Read Phenomena

When isolation is relaxed, transactions can observe these anomalies. The SQL standard names three; two more (lost update and write skew) are real bugs you’ll hit in practice that don’t appear in the original definitions.

1. Dirty Read

A transaction reads data written by another transaction that has not yet committed. If the writer rolls back, the reader has acted on data that never officially existed.

Transaction B read a value that was rolled back. The seat was available all along — but the customer was already told it wasn’t.

2. Non-Repeatable Read

A transaction reads the same row twice and gets different values, because another committed transaction modified that row in between.

Same row, same transaction, two different values. Causes silent billing discrepancies that only surface in customer complaints.

3. Phantom Read

A transaction re-executes a query with a filter and gets a different set of rows — not because existing rows changed, but because another transaction inserted or deleted matching rows.

No existing row was modified between Transaction A’s two reads — a new row appeared and changed the result. Especially dangerous in booking, ticketing, and inventory systems.

4. Lost Update

Two transactions read the same value, independently compute an update, and write it back — one silently overwrites the other.

Two orders shipped, stock only dropped by 1. At scale this leads to overselling — shipping products that don’t exist. The safe fix is an atomic update: SET stock = stock - 1 instead of a read-modify-write cycle.

5. Write Skew

Two transactions read the same rows, see a consistent snapshot, and each updates different rows — but the combined result violates a constraint that no single transaction would have broken on its own.

Neither transaction modified a row the other read, so standard MVCC snapshot rules don’t see a conflict. Snapshot isolation (Postgres Repeatable Read) does not prevent this. Only true SERIALIZABLE does.


The Four Isolation Levels

The SQL standard defines four levels. Each prevents a progressively stricter set of phenomena.

Read Uncommitted

The weakest level. Transactions can read uncommitted data from others.

  • Prevents: nothing
  • Allows: all five phenomena
  • When to use: almost never. Possibly rough approximate analytics where slightly dirty counts are acceptable and you need maximum read throughput.
  • When to avoid: any transactional system. Any read where the answer drives a decision. Anywhere correctness matters.
  • Postgres note: Postgres internally treats READ UNCOMMITTED as READ COMMITTED — it never returns dirty reads even when you ask for them.

Read Committed

Transactions only see committed data. Each statement gets a fresh view of the latest committed state, so the same query inside one transaction may return different rows if run twice.

  • Prevents: dirty reads
  • Allows: non-repeatable reads, phantom reads, lost updates, write skew
  • When to use: the safe baseline for most OLTP applications — dashboards, user profile reads, standard CRUD. Default in Postgres, Oracle, SQL Server.
  • When to avoid: multi-step logic that reads the same row twice (you’ll see shifting values mid-transaction). Operations that compute derived results from multiple rows that need to be a consistent set.

Repeatable Read

Once a transaction reads a row, the transaction continues to see the snapshot from when it began — even if other transactions commit changes to that row in the meantime.

  • Prevents: dirty reads, non-repeatable reads
  • Allows: phantom reads (in standard implementations), write skew
  • When to use: multi-step business logic that reads the same data multiple times and needs a consistent view. Generating invoices, calculating totals, multi-query reports.
  • When to avoid: anywhere write skew matters (on-call rotations, multi-row constraint enforcement). Use SERIALIZABLE instead.
  • Postgres note: Repeatable Read in Postgres also prevents lost updates by aborting the second transaction with a 40001 serialization error. MySQL’s InnoDB doesn’t — use SELECT … FOR UPDATE or atomic updates if you’re on MySQL.
  • MySQL note: MySQL InnoDB’s Repeatable Read prevents phantom reads via gap locks on locking reads (SELECT … FOR UPDATE). Plain SELECT uses snapshot reads, which can still observe phantoms in some edge cases.

Serializable

The strictest level. Transactions execute as if they ran one at a time, sequentially. No concurrency anomalies are possible.

  • Prevents: all five phenomena
  • Allows: nothing — at the cost of more transaction aborts under contention
  • When to use: financial ledgers, inventory reservation, double-entry bookkeeping, regulatory compliance — anywhere a bug means lost money, corrupted data, or violated invariants. Use it precisely where you need it.
  • When to avoid: as a blanket “be safe” setting. The retry cost on contention is real, and most application code doesn’t handle serialization-failure errors correctly. Use Read Committed by default and raise to SERIALIZABLE only on the transactions that genuinely need it.
  • Postgres note: Implemented via SSI (Serializable Snapshot Isolation) — see below.

The phenomena-by-level matrix

Isolation LevelDirty ReadNon-RepeatablePhantomLost UpdateWrite Skew
Read UncommittedPossiblePossiblePossiblePossiblePossible
Read CommittedPreventedPossiblePossiblePossiblePossible
Repeatable ReadPreventedPreventedPossible *Prevented (PG) †Possible
SerializablePreventedPreventedPreventedPreventedPrevented

* MySQL InnoDB’s Repeatable Read also prevents phantom reads via gap locks. † PostgreSQL’s Repeatable Read prevents lost updates by aborting the conflicting transaction. MySQL InnoDB’s does not.


How Postgres Implements Isolation

Two mechanisms, layered.

MVCC — readers and writers don’t block each other

Every UPDATE writes a new row version; the old version stays until vacuum reclaims it. Each transaction’s snapshot determines which version it sees. Consequences:

  • Readers never block writers and vice versa. A long-running SELECT doesn’t lock anyone out of writing.
  • Repeatable Read is implemented by simply pinning the snapshot for the entire transaction.
  • The cost is bloat — dead row versions accumulate, and autovacuum reclaims them in the background.

SSI — Serializable without predicate locking

Classical serializable isolation requires predicate locks: locks not on rows, but on query conditions. They’re correct but block aggressively. Postgres instead uses Serializable Snapshot Isolation (SSI):

  1. Each transaction runs at Repeatable Read (its own snapshot).
  2. Postgres tracks read-write dependencies between concurrent transactions in the background.
  3. At commit time, if Postgres detects a dependency cycle that would produce a non-serializable schedule, it aborts the offending transaction with a serialization-failure error.

The trade: less blocking, more aborts. Your application must be prepared to retry transactions that fail with SQLSTATE 40001:

-- In application code:
BEGIN ISOLATION LEVEL SERIALIZABLE;
-- ... do work ...
COMMIT;
-- If COMMIT returns serialization_failure, sleep briefly and retry
-- the whole transaction from BEGIN.

This retry loop is non-negotiable for SERIALIZABLE. Skipping it means random transaction failures under load.

A note on other databases

  • Oracle’s “Serializable” is actually snapshot isolation — it permits write skew, despite the name.
  • SQL Server has “Serializable” (predicate locking) and “Snapshot” (MVCC) as distinct levels.
  • MySQL InnoDB’s Repeatable Read is unusual: it prevents phantoms with gap locks but not lost updates with snapshot conflict detection.

Treat “Serializable” with skepticism across systems — read your DB’s actual semantics.


Concurrency Tools in Application Code

Isolation levels handle concurrency at the database level. In application code, three more tools cover the cases isolation alone won’t fix.

SELECT … FOR UPDATE (Pessimistic Locking)

Acquires a row-level write lock as part of the read. Other transactions that try to read the same row with FOR UPDATE block until you commit.

BEGIN;
  SELECT stock FROM products WHERE sku = 'WIDGET' FOR UPDATE;
  -- application logic decides what to write
  UPDATE products SET stock = stock - 1 WHERE sku = 'WIDGET';
COMMIT;

Use when you need to read, decide, and write atomically and the read must reflect the latest committed value, not a snapshot. Common in inventory deduction, ticket reservation, money transfer.

Watch out: holding a row lock across an HTTP boundary or a slow operation pins the lock for the duration. A SELECT … FOR UPDATE followed by a 2-second external API call followed by COMMIT means every concurrent request for that row waits 2 seconds.

Atomic Updates

When the new value is a pure function of the old one, skip the lock and let the database compute it:

UPDATE products
SET stock = stock - 1
WHERE sku = 'WIDGET' AND stock > 0;

Single statement, no read-modify-write race. The AND stock > 0 predicate prevents negative stock without an extra check. The simplest correct solution to lost updates when applicable.

Optimistic Concurrency Control (Version Columns)

Add a version column. Every update increments it. Every update conditions the write on the version it last read:

SELECT id, balance, version FROM accounts WHERE id = 42;
-- application: balance=100, version=7
UPDATE accounts
SET balance = 80, version = 8
WHERE id = 42 AND version = 7;

If another transaction wrote first, the row’s version is now 8 and your WHERE version = 7 matches zero rows. The application sees rowcount = 0 and retries the read-modify-write cycle.

Pros: no blocking, no deadlocks, works across HTTP boundaries (the version travels in the request). Cons: requires retry logic, performs poorly under high contention.

The decision between pessimistic and optimistic comes down to contention rate. Rare contention → optimistic. Frequent contention → pessimistic.


Decision Tree

ScenarioReach forWhy
Standard web app CRUD, dashboards, profile readsREAD COMMITTEDDefault. Cheap. Anomalies are real but acceptable in this domain.
Multi-step report or invoice — query the same data multiple times in one transactionREPEATABLE READConsistent snapshot. Avoids the price-changed-mid-transaction class of bug.
Concurrent decrement of a counter / stock / balanceAtomic UPDATE ... SET x = x - 1 at READ COMMITTEDSingle statement, no race. Don’t reach for higher isolation.
Need to read, decide, then write atomically; long-lived contention expectedSELECT … FOR UPDATE at READ COMMITTEDPessimistic lock on the row. Easy mental model. Watch lock duration.
Same, but contention is rare and you can retryOptimistic version column at READ COMMITTEDNo blocking, works across HTTP boundaries, just needs retry logic.
Multi-row invariant (“at least one doctor on call,” “no double-booking”)SERIALIZABLEOnly level that catches write skew. Implement retry on serialization failure.
Financial transfers, double-entry ledgerSERIALIZABLEThe lost-money class of bug. Worth the aborts.

The pattern: default to READ COMMITTED, raise selectively where the workload demands it. Most applications need higher isolation only on specific endpoints — not as a global setting.


Common Mistakes

  • Treating the ORM default as the right answer. Hibernate/JPA defaults to REPEATABLE_READ, Rails ActiveRecord uses READ COMMITTED, Django passes through to the DB default. Check what your stack is actually doing before assuming.
  • Setting SERIALIZABLE globally as “safety.” Most code paths don’t need it, and most application code doesn’t handle serialization-failure retries. The result is random 40001 errors under load that look like flaky tests.
  • Using SERIALIZABLE without retry logic. SSI assumes the application retries aborted transactions. Without retry, SERIALIZABLE is strictly worse than Repeatable Read.
  • Holding a transaction open across a network round-trip. Every external API call, every queue write, every cross-service hop inside a BEGIN … COMMIT extends lock duration. Open the transaction at the last possible moment, close it immediately.
  • Read-modify-write where atomic update would work. SELECT balance → decide → UPDATE balance invites lost updates. Use UPDATE … SET balance = balance + ? instead.
  • Assuming MySQL Repeatable Read prevents phantoms in plain SELECT. It does only for locking reads. Plain SELECT uses snapshot reads.
  • Assuming Repeatable Read prevents write skew. It doesn’t — that’s specifically what SERIALIZABLE (or explicit row locking via FOR UPDATE) is for.
  • Long-running transactions on a busy primary. MVCC bloat: every transaction’s snapshot pins all row versions newer than the snapshot. A 10-minute transaction keeps 10 minutes of dead versions around, slowing every other query.
  • Mixing optimistic and pessimistic in the same flow. Confusing to reason about. Pick one strategy per resource.
  • Trusting “Serializable” as a name. Oracle’s SERIALIZABLE is snapshot isolation. Always check your DB’s actual semantics.
  • Ignoring deadlocks. Pessimistic locking on multiple rows in different orders deadlocks under concurrency. Standardize the lock order (e.g., always lock the smaller account ID first in a transfer).

Closing

Isolation level isn’t a “set and forget” knob. It’s a per-workload decision that interacts with how you structure your transactions, how long you hold them open, and what your application does when a commit fails. The teams that get this right are usually the ones who default to READ COMMITTED, identify which specific transactions actually need stronger guarantees, raise those to REPEATABLE READ or SERIALIZABLE, and handle retries in code. The teams that get it wrong either set SERIALIZABLE everywhere and wonder why their throughput collapsed, or stay on READ COMMITTED and write SELECT … FOR UPDATE everywhere as a substitute.

Getting these wrong is how production databases produce corrupted data, double-bookings, billing discrepancies, and race conditions that only show up at 3 AM under peak load. Getting them right is mostly about understanding which anomaly your code can encounter, choosing the cheapest level that prevents it, and writing the retry path for when the database asks you to.