← back

Data Storage

Database Indexing

B-trees, LSM trees, hash indexes, and composite indexes. Understand how indexes speed up reads, slow down writes, and when to use each type.

Database Indexing

An index is a data structure that the database maintains alongside your table to make lookups faster. Without an index, the database must scan every row in the table to answer a query (a "full table scan"). With the right index, it can jump directly to the relevant rows.

Indexing is fundamental to system design because database performance is often the bottleneck, and indexing decisions can make the difference between a query taking 5ms vs 5 seconds.

Why Indexes Matter

Consider a users table with 100 million rows. Without an index on `email`:

1
2
3
SELECT * FROM users WHERE email = 'alice@example.com';
-- Full table scan: O(N) = 100,000,000 row comparisons
-- Time: ~30 seconds

With a B-tree index on `email`:

1
2
3
SELECT * FROM users WHERE email = 'alice@example.com';
-- Index lookup: O(log N) ≈ 27 comparisons
-- Time: ~1 millisecond

That is a 30,000x improvement. This is not an exaggeration — it is the real-world difference between an indexed and unindexed query on a large table.

B-Tree Indexes

B-trees (specifically B+ trees) are the default index type in virtually every relational database: PostgreSQL, MySQL, SQLite, Oracle, and SQL Server all use them.

How B-Trees Work

A B-tree is a balanced tree where each node contains multiple keys and pointers. Internal nodes act as routing guides, and leaf nodes contain the actual indexed values along with pointers to the corresponding rows on disk.

1
2
3
4
5
6
7
                    [50]
                   /    \
            [20, 35]    [65, 80]
           /   |   \   /   |   \
        [10] [25] [40] [55] [70] [90]
         |    |    |    |    |    |
        rows rows rows rows rows rows

Key properties:

  • Balanced — all leaf nodes are at the same depth, so lookups are always O(log N)
  • Each node is sized to fit a disk page (typically 4KB or 8KB), minimizing I/O
  • Leaf nodes are linked together, enabling efficient range scans
  • Supports equality lookups, range queries, and prefix matching

B-Tree Operations

OperationTime ComplexityNotes
Point lookup (`WHERE x = 5`)O(log N)Traverse from root to leaf
Range scan (`WHERE x BETWEEN 1 AND 100`)O(log N + K)Find start, scan linked leaves
InsertO(log N)May cause node splits
DeleteO(log N)May cause node merges

Write Amplification

Every write to the table requires updating every index on that table. If a table has 5 indexes and you insert a row, the database must perform 6 writes: one to the table and one to each index. This is why indexes speed up reads but slow down writes.

LSM Trees (Log-Structured Merge Trees)

LSM trees are the index structure used by write-optimized databases like Apache Cassandra, LevelDB, RocksDB, and ScyllaDB.

How LSM Trees Work

Instead of updating data in place (like B-trees), LSM trees buffer writes in memory and flush them to disk in sorted batches:

  1. Memtable: Writes go to an in-memory sorted structure (usually a red-black tree or skip list)
  2. Flush: When the memtable reaches a size threshold, it is written to disk as an immutable SSTable (Sorted String Table)
  3. Compaction: Background process merges multiple SSTables into larger ones, removing duplicates and deleted entries
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Write path:
  Client Write -> Memtable (in-memory, sorted)
                     |
                     v (flush when full)
                  SSTable L0 (disk)
                     |
                     v (compaction)
                  SSTable L1 (disk, larger)
                     |
                     v (compaction)
                  SSTable L2 (disk, even larger)

Read path:
  Check Memtable -> Check L0 SSTables -> Check L1 -> ... -> Check Ln
  (Use Bloom filters to skip SSTables that definitely don't contain the key)

B-Tree vs LSM Tree Comparison

AspectB-TreeLSM Tree
Write throughputLower (random I/O)Higher (sequential I/O)
Read latencyLower (single lookup)Higher (may check multiple levels)
Space amplificationLowerHigher (duplicates across levels)
Write amplificationModerateCan be high (compaction rewrites)
Range scansExcellentGood (after compaction)
Use caseRead-heavy OLTPWrite-heavy workloads

Rule of thumb: If your workload is read-heavy with moderate writes (most web applications), B-trees are the better default. If you have extreme write throughput (IoT, time-series, logging), LSM trees are worth considering.

Hash Indexes

A hash index uses a hash table to map keys to row locations. The database computes `hash(key)` and directly looks up the row.

1
hash("alice@example.com") -> bucket 42 -> row pointer

Pros: O(1) lookups for exact-match queries — faster than B-trees for point lookups.

Cons:

  • Cannot support range queries (`WHERE age > 25` requires a scan)
  • Cannot support sorting (`ORDER BY`)
  • Cannot support prefix matching (`WHERE name LIKE 'Ali%'`)
  • Only useful for equality comparisons (`=`, `IN`)

When to use: When you only ever query by exact match and need maximum lookup speed. In practice, B-trees are nearly as fast for point lookups and far more versatile, so hash indexes are rarely used explicitly.

PostgreSQL note: PostgreSQL supports hash indexes but the documentation historically discouraged them. As of PostgreSQL 10, they are WAL-logged and crash-safe, but B-trees are still the default recommendation.

Composite (Multi-Column) Indexes

A composite index covers multiple columns. The order of columns in the index matters enormously.

1
CREATE INDEX idx_user_city_age ON users(city, age);

This index is a B-tree sorted first by `city`, then by `age` within each city. Think of it like a phone book sorted by last name, then first name.

The Leftmost Prefix Rule

A composite index on `(A, B, C)` can be used for queries on:

  • `A` alone
  • `A` and `B` together
  • `A`, `B`, and `C` together

It cannot be used efficiently for:

  • `B` alone (skips the first column)
  • `C` alone
  • `B` and `C` together
1
2
3
4
5
6
7
8
9
10
11
# Index on (city, age, name)

# USES the index (leftmost prefix match):
SELECT * FROM users WHERE city = 'NYC';
SELECT * FROM users WHERE city = 'NYC' AND age = 30;
SELECT * FROM users WHERE city = 'NYC' AND age > 25 AND name = 'Alice';

# DOES NOT use the index efficiently:
SELECT * FROM users WHERE age = 30;              # Skips 'city'
SELECT * FROM users WHERE name = 'Alice';         # Skips 'city' and 'age'
SELECT * FROM users WHERE age = 30 AND name = 'X'; # Skips 'city'

Column Order Strategy

Put the most selective column (highest cardinality) that you filter with equality first. Put range-scanned columns last.

1
2
3
4
# If you query: WHERE status = 'active' AND created_at > '2024-01-01'
# Good index order: (status, created_at)
# Bad index order:  (created_at, status)
# Why? Equality on status narrows results, then range scan on created_at within that subset.

Covering Indexes

A covering index includes all columns needed by a query, so the database can answer the query entirely from the index without touching the table data (no "heap lookup").

1
2
3
4
5
-- Query:
SELECT name, email FROM users WHERE city = 'NYC';

-- Covering index:
CREATE INDEX idx_cover ON users(city) INCLUDE (name, email);

The `INCLUDE` clause (PostgreSQL) or covering columns store the extra data in the index leaf nodes. The database reads only the index — no table access needed.

When this matters: For read-heavy queries on large tables, avoiding the heap lookup can be a 2-5x speedup because the index is much smaller than the table and more likely to fit in memory.

When Indexes Hurt

Indexes are not free. Every index you add has costs:

Write penalty: Each INSERT, UPDATE, or DELETE must update every index. A table with 10 indexes makes every write ~10x more expensive than an unindexed table.

Storage cost: Indexes consume disk space. A B-tree index on a 100GB table might be 20-30GB. Multiple indexes can exceed the table size.

Maintenance overhead: Indexes can become fragmented over time and need periodic rebuilding (REINDEX in PostgreSQL, OPTIMIZE TABLE in MySQL).

Optimizer confusion: Too many indexes can confuse the query optimizer. It might choose a suboptimal index or spend too long planning.

Signs You Have Too Many Indexes

  • Write-heavy tables with slow INSERT/UPDATE performance
  • Index storage exceeding table storage
  • Indexes that never get used (`pg_stat_user_indexes` in PostgreSQL shows index usage counts)

Signs You Need More Indexes

  • Slow queries with sequential scans on large tables
  • High disk I/O during read operations
  • `EXPLAIN` showing "Seq Scan" instead of "Index Scan"

Index Selection Strategy

Here is a practical framework for choosing indexes:

  1. Start with your queries. List the most frequent and most performance-critical queries. Indexes should serve queries, not the other way around.
  1. Index primary lookup patterns. If you always look up users by email, index email. If you always filter orders by user_id and status, create a composite index on (user_id, status).
  1. Check query plans. Use `EXPLAIN ANALYZE` (PostgreSQL) or `EXPLAIN` (MySQL) to see if your queries use indexes or fall back to sequential scans.
1
2
3
4
5
6
7
8
9
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123 AND status = 'shipped';

-- Good output:
-- Index Scan using idx_orders_user_status on orders
-- Planning Time: 0.1ms, Execution Time: 0.5ms

-- Bad output:
-- Seq Scan on orders (filter: user_id = 123 AND status = 'shipped')
-- Planning Time: 0.1ms, Execution Time: 3500ms
  1. Monitor unused indexes. Periodically check which indexes are actually being used and drop the ones that are not.
  1. Consider covering indexes for your top 2-3 most critical queries to eliminate heap lookups.

Specialized Index Types

GIN (Generalized Inverted Index): Used for full-text search, array columns, and JSONB queries in PostgreSQL. Essential for `@>`, `?`, and text search operators.

GiST (Generalized Search Tree): Used for geometric data, range types, and full-text search. Supports operators like "contains," "overlaps," and "nearest neighbor."

BRIN (Block Range Index): Extremely compact index for columns that are naturally ordered (e.g., timestamp columns where rows are inserted chronologically). Stores min/max values per block range rather than individual values. Uses 1000x less space than a B-tree.

Partial Index: An index on a subset of rows. For example, indexing only active users:

1
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

This is much smaller and faster than indexing all users if most queries only care about active ones.

Interview Tips

Always propose indexes for your schema. When you design a database schema in an interview, immediately identify the access patterns and propose appropriate indexes. This shows you think about performance.

Know the write cost. If an interviewer asks "why not just index every column?", explain the write amplification cost and storage overhead.

Use composite indexes wisely. A single well-designed composite index often replaces 2-3 individual indexes. Explain the leftmost prefix rule.

Mention covering indexes for critical paths. If a query is on your hot path (e.g., the main feed query), a covering index that avoids heap lookups is a powerful optimization.

Connect to sharding. When discussing sharded databases, note that each shard has its own indexes. Global secondary indexes (across shards) are expensive and eventually consistent.

EXPLAIN is your friend. Mention that you would use EXPLAIN ANALYZE to validate index effectiveness. This shows practical experience rather than just theoretical knowledge.