Data Storage
B-trees, LSM trees, hash indexes, and composite indexes. Understand how indexes speed up reads, slow down writes, and when to use each type.
An index is a data structure that the database maintains alongside your table to make lookups faster. Without an index, the database must scan every row in the table to answer a query (a "full table scan"). With the right index, it can jump directly to the relevant rows.
Indexing is fundamental to system design because database performance is often the bottleneck, and indexing decisions can make the difference between a query taking 5ms vs 5 seconds.
Consider a users table with 100 million rows. Without an index on `email`:
SELECT * FROM users WHERE email = 'alice@example.com';
-- Full table scan: O(N) = 100,000,000 row comparisons
-- Time: ~30 secondsWith a B-tree index on `email`:
SELECT * FROM users WHERE email = 'alice@example.com';
-- Index lookup: O(log N) ≈ 27 comparisons
-- Time: ~1 millisecondThat is a 30,000x improvement. This is not an exaggeration — it is the real-world difference between an indexed and unindexed query on a large table.
B-trees (specifically B+ trees) are the default index type in virtually every relational database: PostgreSQL, MySQL, SQLite, Oracle, and SQL Server all use them.
A B-tree is a balanced tree where each node contains multiple keys and pointers. Internal nodes act as routing guides, and leaf nodes contain the actual indexed values along with pointers to the corresponding rows on disk.
[50]
/ \
[20, 35] [65, 80]
/ | \ / | \
[10] [25] [40] [55] [70] [90]
| | | | | |
rows rows rows rows rows rowsKey properties:
| Operation | Time Complexity | Notes |
|---|---|---|
Point lookup (`WHERE x = 5`) | O(log N) | Traverse from root to leaf |
Range scan (`WHERE x BETWEEN 1 AND 100`) | O(log N + K) | Find start, scan linked leaves |
| Insert | O(log N) | May cause node splits |
| Delete | O(log N) | May cause node merges |
Every write to the table requires updating every index on that table. If a table has 5 indexes and you insert a row, the database must perform 6 writes: one to the table and one to each index. This is why indexes speed up reads but slow down writes.
LSM trees are the index structure used by write-optimized databases like Apache Cassandra, LevelDB, RocksDB, and ScyllaDB.
Instead of updating data in place (like B-trees), LSM trees buffer writes in memory and flush them to disk in sorted batches:
Write path:
Client Write -> Memtable (in-memory, sorted)
|
v (flush when full)
SSTable L0 (disk)
|
v (compaction)
SSTable L1 (disk, larger)
|
v (compaction)
SSTable L2 (disk, even larger)
Read path:
Check Memtable -> Check L0 SSTables -> Check L1 -> ... -> Check Ln
(Use Bloom filters to skip SSTables that definitely don't contain the key)| Aspect | B-Tree | LSM Tree |
|---|---|---|
| Write throughput | Lower (random I/O) | Higher (sequential I/O) |
| Read latency | Lower (single lookup) | Higher (may check multiple levels) |
| Space amplification | Lower | Higher (duplicates across levels) |
| Write amplification | Moderate | Can be high (compaction rewrites) |
| Range scans | Excellent | Good (after compaction) |
| Use case | Read-heavy OLTP | Write-heavy workloads |
Rule of thumb: If your workload is read-heavy with moderate writes (most web applications), B-trees are the better default. If you have extreme write throughput (IoT, time-series, logging), LSM trees are worth considering.
A hash index uses a hash table to map keys to row locations. The database computes `hash(key)` and directly looks up the row.
hash("alice@example.com") -> bucket 42 -> row pointerPros: O(1) lookups for exact-match queries — faster than B-trees for point lookups.
Cons:
`WHERE age > 25` requires a scan)`ORDER BY`)`WHERE name LIKE 'Ali%'`)`=`, `IN`)When to use: When you only ever query by exact match and need maximum lookup speed. In practice, B-trees are nearly as fast for point lookups and far more versatile, so hash indexes are rarely used explicitly.
PostgreSQL note: PostgreSQL supports hash indexes but the documentation historically discouraged them. As of PostgreSQL 10, they are WAL-logged and crash-safe, but B-trees are still the default recommendation.
A composite index covers multiple columns. The order of columns in the index matters enormously.
CREATE INDEX idx_user_city_age ON users(city, age);This index is a B-tree sorted first by `city`, then by `age` within each city. Think of it like a phone book sorted by last name, then first name.
A composite index on `(A, B, C)` can be used for queries on:
`A` alone`A` and `B` together`A`, `B`, and `C` togetherIt cannot be used efficiently for:
`B` alone (skips the first column)`C` alone`B` and `C` together# Index on (city, age, name)
# USES the index (leftmost prefix match):
SELECT * FROM users WHERE city = 'NYC';
SELECT * FROM users WHERE city = 'NYC' AND age = 30;
SELECT * FROM users WHERE city = 'NYC' AND age > 25 AND name = 'Alice';
# DOES NOT use the index efficiently:
SELECT * FROM users WHERE age = 30; # Skips 'city'
SELECT * FROM users WHERE name = 'Alice'; # Skips 'city' and 'age'
SELECT * FROM users WHERE age = 30 AND name = 'X'; # Skips 'city'Put the most selective column (highest cardinality) that you filter with equality first. Put range-scanned columns last.
# If you query: WHERE status = 'active' AND created_at > '2024-01-01'
# Good index order: (status, created_at)
# Bad index order: (created_at, status)
# Why? Equality on status narrows results, then range scan on created_at within that subset.A covering index includes all columns needed by a query, so the database can answer the query entirely from the index without touching the table data (no "heap lookup").
-- Query:
SELECT name, email FROM users WHERE city = 'NYC';
-- Covering index:
CREATE INDEX idx_cover ON users(city) INCLUDE (name, email);The `INCLUDE` clause (PostgreSQL) or covering columns store the extra data in the index leaf nodes. The database reads only the index — no table access needed.
When this matters: For read-heavy queries on large tables, avoiding the heap lookup can be a 2-5x speedup because the index is much smaller than the table and more likely to fit in memory.
Indexes are not free. Every index you add has costs:
Write penalty: Each INSERT, UPDATE, or DELETE must update every index. A table with 10 indexes makes every write ~10x more expensive than an unindexed table.
Storage cost: Indexes consume disk space. A B-tree index on a 100GB table might be 20-30GB. Multiple indexes can exceed the table size.
Maintenance overhead: Indexes can become fragmented over time and need periodic rebuilding (REINDEX in PostgreSQL, OPTIMIZE TABLE in MySQL).
Optimizer confusion: Too many indexes can confuse the query optimizer. It might choose a suboptimal index or spend too long planning.
`pg_stat_user_indexes` in PostgreSQL shows index usage counts)`EXPLAIN` showing "Seq Scan" instead of "Index Scan"Here is a practical framework for choosing indexes:
`EXPLAIN ANALYZE` (PostgreSQL) or `EXPLAIN` (MySQL) to see if your queries use indexes or fall back to sequential scans.EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123 AND status = 'shipped';
-- Good output:
-- Index Scan using idx_orders_user_status on orders
-- Planning Time: 0.1ms, Execution Time: 0.5ms
-- Bad output:
-- Seq Scan on orders (filter: user_id = 123 AND status = 'shipped')
-- Planning Time: 0.1ms, Execution Time: 3500msGIN (Generalized Inverted Index): Used for full-text search, array columns, and JSONB queries in PostgreSQL. Essential for `@>`, `?`, and text search operators.
GiST (Generalized Search Tree): Used for geometric data, range types, and full-text search. Supports operators like "contains," "overlaps," and "nearest neighbor."
BRIN (Block Range Index): Extremely compact index for columns that are naturally ordered (e.g., timestamp columns where rows are inserted chronologically). Stores min/max values per block range rather than individual values. Uses 1000x less space than a B-tree.
Partial Index: An index on a subset of rows. For example, indexing only active users:
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';This is much smaller and faster than indexing all users if most queries only care about active ones.
Always propose indexes for your schema. When you design a database schema in an interview, immediately identify the access patterns and propose appropriate indexes. This shows you think about performance.
Know the write cost. If an interviewer asks "why not just index every column?", explain the write amplification cost and storage overhead.
Use composite indexes wisely. A single well-designed composite index often replaces 2-3 individual indexes. Explain the leftmost prefix rule.
Mention covering indexes for critical paths. If a query is on your hot path (e.g., the main feed query), a covering index that avoids heap lookups is a powerful optimization.
Connect to sharding. When discussing sharded databases, note that each shard has its own indexes. Global secondary indexes (across shards) are expensive and eventually consistent.
EXPLAIN is your friend. Mention that you would use EXPLAIN ANALYZE to validate index effectiveness. This shows practical experience rather than just theoretical knowledge.