Classic Designs

Ride-Sharing Service

Design Uber or Lyft. Covers geospatial indexing, real-time driver matching, ETA calculation, surge pricing, and location tracking at scale.

Designing a Ride-Sharing Service

Designing Uber or Lyft is a rich system design question that combines geospatial indexing, real-time matching, dynamic pricing, and location tracking at massive scale. The core challenge is connecting a rider to the nearest available driver within seconds, while continuously tracking millions of moving vehicles and responding to fluctuating demand.

Requirements

Functional Requirements

Riders can request a ride by specifying a pickup and dropoff location.
The system matches the rider with the nearest available driver.
Both rider and driver see real-time location updates during the trip.
The system calculates fare based on distance, time, and surge pricing.
Trip lifecycle management: request, match, pickup, in-progress, completed, cancelled.

Non-Functional Requirements

Low latency: Matching a rider to a driver should complete within 5 seconds.
Real-time tracking: Location updates every 3-5 seconds for active trips.
Scale: Support 20 million rides per day (Uber scale), with 5 million concurrent drivers.
High availability: The system must never go down during peak hours. Partial degradation is acceptable.
Consistency: A driver must never be matched to two riders simultaneously.

Capacity Estimation

Assumptions:
  - 5 million active drivers sending location every 4 seconds
  - Location updates: 5M / 4 = 1.25 million updates/sec
  - 20 million rides/day = ~230 ride requests/sec (bursty: peak 5-10x)

Location data:
  - Each update: driver_id (8 bytes) + lat (8) + lng (8) + timestamp (8) = 32 bytes
  - 1.25M updates/sec × 32 bytes = 40 MB/sec ≈ 320 Mbps

Storage (for trip history):
  - 20M trips/day × 365 days × 1 KB metadata = 7.3 TB/year
  - Location traces: 20M trips × avg 20 min × 1 update/4sec × 32 bytes = 58 TB/year

Geospatial Indexing

The fundamental operation of a ride-sharing service is: "Given a rider's location, find the K nearest available drivers." This is a spatial nearest-neighbor query, and it must be fast (< 100ms) over millions of points that are constantly moving.

Geohash

A geohash encodes a latitude/longitude pair into a string where nearby locations share a common prefix. The longer the prefix, the more precise the location.

Geohash examples:
  San Francisco downtown: "9q8yyk"
  A block away:           "9q8yym"
  Across town:            "9q8yh3"
  Los Angeles:            "9q5cs5"

Precision levels:
  4 chars → ~20 km × 20 km cell
  5 chars → ~5 km × 5 km cell
  6 chars → ~1.2 km × 0.6 km cell
  7 chars → ~150 m × 150 m cell

How to use geohash for driver lookup:

Store each active driver's location with their geohash in a data store (Redis sorted set or in-memory index).
When a rider requests a ride, compute the rider's geohash.
Query for all drivers in the same geohash cell and the 8 neighboring cells (to avoid edge effects).
Filter to available drivers and sort by actual distance.

import geohash2

class DriverIndex:
    def __init__(self, redis_client):
        self.redis = redis_client

    def update_driver_location(self, driver_id, lat, lng):
        gh = geohash2.encode(lat, lng, precision=6)
        # Remove from old geohash cell, add to new one
        self.redis.geoadd("drivers:locations", lng, lat, driver_id)
        self.redis.hset("drivers:geohash", driver_id, gh)
        self.redis.sadd(f"drivers:cell:{gh}", driver_id)

    def find_nearby_drivers(self, lat, lng, radius_km=5, limit=10):
        # Use Redis GEORADIUS for efficient spatial query
        nearby = self.redis.georadius(
            "drivers:locations", lng, lat, radius_km, unit="km",
            withcoord=True, withdist=True, count=limit, sort="ASC"
        )
        return [(driver_id, dist) for driver_id, dist, coord in nearby]

Quadtree

A quadtree recursively divides the 2D space into four quadrants. Each leaf node covers a geographic area and contains the drivers in that area. When a leaf has too many drivers (e.g., > 100), it splits into four children. When it has too few, it merges back.

Quadtree for a city:

    ┌───────────┬───────────┐
    │           │   NE      │
    │   NW      │  (15      │
    │  (8       │  drivers) │
    │  drivers) ├─────┬─────┤
    │           │     │     │
    ├───────────┤ SE1 │ SE2 │
    │   SW      │(120)│(85) │
    │  (42      │split│     │
    │  drivers) │into4│     │
    └───────────┴─────┴─────┘

Dense areas (downtown) have more subdivisions.
Sparse areas (suburbs) have larger cells.

Geohash vs Quadtree:

Geohash: Simpler to implement, works well with Redis/databases, fixed grid size. Poor adaptation to varying driver density.
Quadtree: Adaptive to density (more precision where there are more drivers). More complex to implement and update.

In practice, Redis GEO commands (which use a geohash-based sorted set internally) are the most common choice because they are simple, fast, and handle the update rate well.

Real-Time Driver Matching

When a rider requests a ride, the matching algorithm must quickly find the best driver. "Best" is a function of several factors:

Matching Algorithm

def match_rider_to_driver(rider_location, ride_type):
    # 1. Find nearby available drivers
    candidates = find_nearby_drivers(rider_location, radius_km=5)

    # 2. Filter by ride type (UberX, UberXL, etc.)
    candidates = [d for d in candidates if d.vehicle_type == ride_type]

    # 3. Calculate ETA for each candidate
    for driver in candidates:
        driver.eta = calculate_eta(driver.location, rider_location)

    # 4. Score and rank
    for driver in candidates:
        driver.score = (
            0.5 * normalize(driver.eta) +          # closer is better
            0.3 * normalize(driver.rating) +        # higher rated is better
            0.2 * normalize(driver.acceptance_rate)  # more reliable is better
        )

    # 5. Send request to the top candidate
    best_driver = min(candidates, key=lambda d: d.score)
    send_ride_request(best_driver, timeout=15)  # 15 seconds to accept

Handling Rejections and Timeouts

If the first driver does not accept within 15 seconds (or declines), the system moves to the next best candidate. This continues until a driver accepts or the system gives up (typically after 3-5 attempts).

Concurrent matching protection: Once a ride request is sent to a driver, that driver must be temporarily locked so they do not receive another request simultaneously. Use a distributed lock (Redis SETNX with TTL) on the driver ID.

ETA Calculation

ETA (Estimated Time of Arrival) is critical for both matching (choose the fastest driver) and user experience (show the rider how long they will wait).

Road Network Routing

Computing ETA requires a shortest-path calculation on the road network graph. At scale, you cannot run Dijkstra's algorithm for every request.

Precomputation strategies:
  - Partition the road network into regions.
  - Precompute shortest paths between region boundaries.
  - For an ETA query: route within the source region → cross regions
    using precomputed paths → route within the destination region.

Real-time adjustments:
  - Multiply road segment travel times by a traffic factor.
  - Traffic factors are updated every few minutes from live GPS data
    (the drivers themselves are traffic sensors).

Services like Uber use a combination of precomputed routing tables and real-time traffic data from their own fleet of drivers to produce accurate ETAs.

Surge Pricing

When demand exceeds supply in an area, surge pricing (dynamic pricing) increases fares to incentivize more drivers to enter the area and to reduce demand.

Surge Pricing Algorithm:

1. Divide the city into zones (geohash cells or custom polygons).
2. For each zone, compute:
   - demand = number of ride requests in the last 5 minutes
   - supply = number of available drivers in the zone
   - ratio = demand / supply
3. If ratio > threshold (e.g., 1.5):
   surge_multiplier = f(ratio)  # e.g., 1.5x, 2.0x, 3.0x

Zone: Downtown
  Demand: 200 requests / 5 min
  Supply: 50 drivers
  Ratio: 4.0
  Surge: 2.5x

Zone: Suburbs
  Demand: 30 requests / 5 min
  Supply: 40 drivers
  Ratio: 0.75
  Surge: 1.0x (no surge)

Surge multipliers are recalculated every 1-2 minutes. The rider sees the multiplier before confirming the ride.

Trip Lifecycle State Machine

A trip goes through a well-defined set of states. Modeling this as an explicit state machine prevents invalid transitions and makes the system easier to reason about.

Trip State Machine:

  REQUESTED → MATCHING → DRIVER_ASSIGNED → DRIVER_EN_ROUTE
                │                               │
                ▼                               ▼
           NO_DRIVERS_FOUND              DRIVER_ARRIVED
                │                               │
                ▼                               ▼
            CANCELLED                     IN_PROGRESS
                                               │
                                          ┌────┴────┐
                                          ▼         ▼
                                     COMPLETED  CANCELLED

Valid transitions:
  REQUESTED → MATCHING (system starts searching)
  MATCHING → DRIVER_ASSIGNED (driver accepts)
  MATCHING → NO_DRIVERS_FOUND (all candidates rejected/timed out)
  DRIVER_ASSIGNED → DRIVER_EN_ROUTE (driver starts navigating)
  DRIVER_EN_ROUTE → DRIVER_ARRIVED (driver at pickup)
  DRIVER_ARRIVED → IN_PROGRESS (rider picked up)
  IN_PROGRESS → COMPLETED (arrived at destination)
  Any state → CANCELLED (rider or driver cancels)

class Trip:
    VALID_TRANSITIONS = {
        "REQUESTED": ["MATCHING", "CANCELLED"],
        "MATCHING": ["DRIVER_ASSIGNED", "NO_DRIVERS_FOUND", "CANCELLED"],
        "DRIVER_ASSIGNED": ["DRIVER_EN_ROUTE", "CANCELLED"],
        "DRIVER_EN_ROUTE": ["DRIVER_ARRIVED", "CANCELLED"],
        "DRIVER_ARRIVED": ["IN_PROGRESS", "CANCELLED"],
        "IN_PROGRESS": ["COMPLETED", "CANCELLED"],
    }

    def transition(self, new_state):
        if new_state not in self.VALID_TRANSITIONS.get(self.state, []):
            raise InvalidTransitionError(
                f"Cannot transition from {self.state} to {new_state}"
            )
        self.state = new_state
        self.updated_at = now()
        publish_event(TripStateChanged(self.trip_id, self.state))

Location Tracking at Scale

With 5 million active drivers sending updates every 4 seconds, the system must handle 1.25 million location writes per second.

Architecture for Location Ingestion

Driver Apps → Load Balancer → Location Ingestion Service → Kafka
                                                              │
                              ┌────────────────────────────────┼───────┐
                              ▼                                ▼       ▼
                     ┌──────────────┐                  ┌────────────┐ ┌──────┐
                     │ Driver Index │                  │ Trip       │ │ ETA  │
                     │ (Redis GEO)  │                  │ Tracker    │ │Update│
                     │ update driver│                  │ (store     │ │Svc   │
                     │ positions    │                  │  traces)   │ │      │
                     └──────────────┘                  └────────────┘ └──────┘

Kafka decouples ingestion from processing. Drivers publish locations to Kafka; multiple consumers process them independently.
Redis GEO is updated by one consumer to keep the spatial index current for matching.
Trip Tracker stores location traces for active trips (for map display and post-trip analysis).
ETA Update Service recalculates ETAs for active trips using the latest driver positions.

Optimizing Location Updates

Adaptive frequency: When the driver is idle and stationary, reduce update frequency to once per minute. When en route to a rider or on a trip, increase to every 3-4 seconds.
Delta compression: Instead of sending absolute coordinates each time, send the delta from the previous position. At typical driving speeds, deltas are small and compress well.
Batching: Buffer 2-3 updates on the client and send them in a single request to reduce connection overhead.

High-Level Architecture

               ┌──────────────┐
  Clients ────>│ Load Balancer│
               └──────┬───────┘
                      │
         ┌────────────┼────────────┐
         ▼            ▼            ▼
   ┌──────────┐ ┌──────────┐ ┌──────────┐
   │ Ride     │ │ Location │ │ Pricing  │
   │ Service  │ │ Service  │ │ Service  │
   └────┬─────┘ └────┬─────┘ └────┬─────┘
        │             │             │
        ▼             ▼             ▼
   ┌──────────┐ ┌──────────┐ ┌──────────┐
   │ Trip DB  │ │ Redis    │ │ Kafka    │
   │ (SQL)    │ │ (GEO     │ │(location │
   │          │ │  index)  │ │ events)  │
   └──────────┘ └──────────┘ └──────────┘

Interview Tips

Start with geospatial indexing. Explain geohash or quadtree for finding nearby drivers. This is the most technically interesting component and interviewers expect you to go deep here.
Discuss the matching algorithm in detail. It is not just "find the closest driver." Explain the multi-factor scoring (distance, rating, acceptance rate) and how you handle rejections.
Model the trip as a state machine. This shows you think about correctness and edge cases. What happens if the rider cancels during matching? What if the driver's app crashes mid-trip?
Quantify the location update rate. 1.25 million writes/sec is non-trivial. Explain why Kafka is needed as a buffer and how Redis GEO handles the spatial queries.
Explain surge pricing at a high level. You do not need to derive the pricing formula, but explain the supply/demand ratio concept and how zones are defined.
Mention consistency for driver matching. A driver must never be double-booked. Explain the distributed lock mechanism that prevents two ride requests from being sent to the same driver simultaneously.