Classic Designs
Design a typeahead system like Google search suggestions. Covers trie data structures, top-K queries, prefix caching, and real-time ranking.
A search autocomplete (typeahead) system provides real-time suggestions as a user types a query. Think of Google's search bar: after typing "how to", you instantly see suggestions like "how to tie a tie" and "how to boil eggs." This question tests your knowledge of trie data structures, top-K algorithms, caching strategies, and the balance between precomputation and real-time serving.
Assumptions:
- 5 billion searches per day (Google scale)
- Average query length: 20 characters → 20 autocomplete requests per query
- 5B x 20 = 100 billion autocomplete requests/day
- 100B / 86400 ≈ 1.16 million requests/sec
Unique queries:
- ~500 million unique queries per day
- Store top 10 million most popular queries (covers 99%+ of autocomplete hits)
- Average query: 20 bytes → 10M x 20 = 200 MB (fits easily in memory)
Cache:
- Unique prefixes for top queries: ~50 million prefixes
- Each prefix maps to 10 suggestions x 20 bytes = 200 bytes
- 50M x 200 = 10 GB (fits in a Redis cluster)A trie (prefix tree) is the natural data structure for autocomplete. Each node represents a character, and paths from root to leaf represent complete strings. The key property: all strings sharing a common prefix share the same path from the root.
Trie for queries: "tree", "try", "true", "toy", "top"
(root)
/ \
t ...
/ \
r o
/ \ \
e u y p
| |
e e
Each node stores:
- Character
- Is this node the end of a valid query?
- Frequency count (how often this query was searched)
- Top-K suggestions for this prefix (precomputed)The naive approach to autocomplete with a trie: given a prefix, traverse to the prefix node, then explore all subtrees to find the most popular completions. This is O(total characters in all matching queries) -- far too slow.
Optimization: At each node, precompute and store the top K suggestions for that prefix. When the user types "ho", you simply look up the node for "ho" and return its precomputed list.
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
self.frequency = 0
self.top_suggestions = [] # precomputed top-K
class AutocompleteTrie:
def __init__(self, k=10):
self.root = TrieNode()
self.k = k
def insert(self, query, frequency):
node = self.root
for char in query:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
# Update top-K suggestions at every prefix node
self._update_top_k(node, query, frequency)
node.is_end = True
node.frequency = frequency
def _update_top_k(self, node, query, frequency):
# Check if query is already in top suggestions
for i, (q, f) in enumerate(node.top_suggestions):
if q == query:
node.top_suggestions[i] = (query, frequency)
node.top_suggestions.sort(key=lambda x: -x[1])
return
# Add if we have room or if this query is more popular
if len(node.top_suggestions) < self.k:
node.top_suggestions.append((query, frequency))
node.top_suggestions.sort(key=lambda x: -x[1])
elif frequency > node.top_suggestions[-1][1]:
node.top_suggestions[-1] = (query, frequency)
node.top_suggestions.sort(key=lambda x: -x[1])
def search(self, prefix):
node = self.root
for char in prefix:
if char not in node.children:
return []
node = node.children[char]
return node.top_suggestionsTrade-off: This uses more memory (each node stores K suggestions) but makes queries O(prefix length) -- constant time regardless of how many completions exist.
Query frequencies change slowly for most queries. The trie can be rebuilt periodically (e.g., every few hours) from aggregated query logs.
Data Collection Pipeline:
Search Logs → Kafka → Aggregation Service → Query Frequency DB
│
▼
Trie Builder (offline)
│
▼
Trie Snapshot (serialized)
│
▼
Autocomplete Servers
(load new trie into memory)The aggregation service maintains a rolling window (e.g., last 7 days) of query frequencies, applying exponential decay so that recent queries are weighted more heavily.
For breaking news and trending topics, waiting hours is too long. A separate real-time pipeline detects query spikes and injects them into the autocomplete system.
Search Logs → Kafka → Trend Detector → Trending Queries Cache
│
▼
Autocomplete Servers
(merge trending with trie results)The trend detector uses a sliding window counter (e.g., count queries in the last 15 minutes) and flags queries whose frequency exceeds a threshold relative to their historical baseline.
Since autocomplete is extremely read-heavy and the same prefixes are queried millions of times, aggressive caching is essential.
Prefix → Top-K Results Cache (Redis)
"how t" → ["how to tie a tie", "how to boil eggs", ...]
"weath" → ["weather today", "weather tomorrow", ...]
"pytho" → ["python tutorial", "python download", ...]Cache entries are keyed by the exact prefix string. With 50 million unique prefixes and 10 GB of cache, this covers the vast majority of requests.
Cache invalidation: When the trie is rebuilt, invalidate the entire cache (or use a versioned cache where each trie version has its own namespace).
The browser or mobile app can cache results locally. When the user types "how", the client receives suggestions. When they type "how ", the client can check if it already has results for "how " from a previous keystroke's response that included child prefixes. This eliminates network round trips for many keystrokes.
Users type fast. By the time the response for "ho" arrives, the user may have already typed "how t". The client should:
# Client-side pseudocode
class AutocompleteClient:
def __init__(self):
self.cache = {}
self.pending_request = None
self.debounce_timer = None
def on_keystroke(self, prefix):
# Check local cache first
if prefix in self.cache:
display(self.cache[prefix])
return
# Cancel pending request
if self.pending_request:
self.pending_request.cancel()
# Debounce: wait 50ms before sending
if self.debounce_timer:
self.debounce_timer.cancel()
self.debounce_timer = set_timer(50, lambda: self._send_request(prefix))
def _send_request(self, prefix):
self.pending_request = async_get(f"/autocomplete?q={prefix}")
self.pending_request.then(lambda results: self._handle_response(prefix, results))
def _handle_response(self, prefix, results):
self.cache[prefix] = results
display(results)The quality of autocomplete depends entirely on the quality of query frequency data. The pipeline must collect, clean, and aggregate search queries.
Log every search query with a timestamp. At Google scale, this is billions of events per day flowing into Kafka.
Raw logs:
"python tutorial" x 50,000 (today)
"python tutorial" x 45,000 (yesterday)
"python tutorial" x 40,000 (2 days ago)
Weighted frequency (exponential decay, half-life = 7 days):
Score = 50000 * 1.0 + 45000 * 0.91 + 40000 * 0.82 + ...The simplest ranking: sort suggestions by query frequency. This is what the trie stores. It works well for most users most of the time.
For logged-in users, boost suggestions based on their search history. If a user frequently searches for "python documentation", then when they type "py", "python documentation" should rank higher for them than for an average user.
Final Score = alpha * global_popularity + (1 - alpha) * personal_relevance
Where alpha = 0.7 (global popularity still dominates)Implementation: Store each user's recent queries in a lightweight profile. At query time, merge the global top-K from the trie with the user's personal top-K, re-rank, and return. This merge is fast because both lists are small (K=10).
Additional signals for ranking:
┌──────────────┐
User Query ────> │ Load Balancer│
└──────┬───────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ AC Svc 1 │ │ AC Svc 2 │ │ AC Svc N │
│(trie in │ │(trie in │ │(trie in │
│ memory) │ │ memory) │ │ memory) │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└─────────────┼─────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
┌────────────┐ ┌──────────┐ ┌────────────┐
│ Redis Cache│ │ Trending │ │ User │
│(prefix→top │ │ Queries │ │ Profiles │
│ results) │ │ Cache │ │ (personal) │
└────────────┘ └──────────┘ └────────────┘
Offline Pipeline:
Search Logs → Kafka → Aggregator → Trie Builder → Distribute to AC ServersEach autocomplete server holds a complete copy of the trie in memory (~200 MB for top 10M queries). When the trie is rebuilt:
For larger datasets, shard the trie by prefix range. Server group A handles prefixes a-m, server group B handles n-z. A routing layer directs each request to the correct shard.