← back

Networking & Communication

DNS and CDNs

How DNS resolution works, DNS-based load balancing, and how CDNs cache content at the edge to reduce latency for global users.

DNS and CDNs

Every request on the internet starts with a DNS lookup, and most web content is served through a CDN. These are infrastructure layers that many developers take for granted, but understanding how they work is essential for system design. DNS affects latency, availability, and load distribution. CDNs affect performance, scalability, and cost. Together, they are the foundation of global-scale web architecture.

DNS: The Internet's Phone Book

The Domain Name System translates human-readable domain names (`www.example.com`) into IP addresses (`93.184.216.34`). Without DNS, you would need to remember IP addresses for every website.

How DNS Resolution Works

When you type `www.example.com` into your browser, a multi-step resolution process occurs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Step 1: Browser cache
  "Do I already know the IP for www.example.com?" → Cache miss

Step 2: OS cache
  "Does the operating system have it cached?" → Cache miss

Step 3: Recursive resolver (usually your ISP or 8.8.8.8)
  "Let me find out for you..."

Step 4: Root nameserver
  Resolver → Root: "Where is .com?"
  Root → Resolver: "Ask the .com TLD server at 192.5.6.30"

Step 5: TLD nameserver
  Resolver → .com TLD: "Where is example.com?"
  TLD → Resolver: "Ask the authoritative server at ns1.example.com (198.51.100.1)"

Step 6: Authoritative nameserver
  Resolver → Authoritative: "What is the IP for www.example.com?"
  Authoritative → Resolver: "93.184.216.34, TTL=3600"

Step 7: Resolver caches the result and returns it to the client
1
2
3
4
5
6
7
8
9
10
Browser → OS → Recursive Resolver → Root NS → TLD NS → Authoritative NS
   │       │          │                 │          │            │
   │       │          │                 │          │         returns
   │       │          │                 │       returns      A record
   │       │          │              returns    NS for
   │       │       caches &          TLD NS    example.com
   │    caches &   returns
   │    returns
  uses
  IP

In practice, most of these steps are cached. The recursive resolver caches results for the TTL duration, so subsequent lookups for the same domain skip most of the hierarchy. A typical DNS lookup takes 20-120ms uncached and under 1ms cached.

DNS Record Types

1
2
3
4
5
6
7
8
9
10
11
12
13
┌────────┬─────────────────────────┬──────────────────────────────────────┐
│ Type   │ Purpose                 │ Example                              │
├────────┼─────────────────────────┼──────────────────────────────────────┤
│ A      │ Maps name to IPv4       │ example.com → 93.184.216.34          │
│ AAAA   │ Maps name to IPv6       │ example.com → 2606:2800:220:1:...    │
│ CNAME  │ Alias to another name   │ www.example.com → example.com        │
│ MX     │ Mail server for domain  │ example.com → mail.example.com (10)  │
│ NS     │ Nameserver for domain   │ example.com → ns1.example.com        │
│ TXT    │ Arbitrary text          │ example.com → "v=spf1 ..."           │
│ SRV    │ Service location        │ _sip._tcp.example.com → sip:5060    │
│ SOA    │ Zone authority info     │ Primary NS, admin email, serial      │
│ PTR    │ Reverse lookup (IP→name)│ 34.216.184.93 → example.com          │
└────────┴─────────────────────────┴──────────────────────────────────────┘

Key details:

CNAME records cannot coexist with other records at the same name. You cannot have a CNAME and an MX record for `example.com`. This is why many DNS providers offer "ALIAS" or "ANAME" as proprietary alternatives — they resolve the alias server-side and return an A record to the client.

MX records include a priority number. Mail servers try the lowest priority first: `MX 10 mail1.example.com` is preferred over `MX 20 mail2.example.com`.

TXT records are used for domain verification (Google, AWS), email authentication (SPF, DKIM, DMARC), and other metadata.

TTL (Time to Live)

Every DNS record has a TTL that tells resolvers how long to cache the result. TTL is a trade-off:

  • Low TTL (60-300 seconds): Changes propagate quickly. Useful during migrations, failovers, or when using DNS-based load balancing. Higher DNS query load.
  • High TTL (3600-86400 seconds): Fewer DNS queries, faster resolution from cache. Changes take longer to propagate globally.
1
2
3
4
5
6
7
8
9
10
11
# Common TTL strategies
dns_records = {
    # Static content — rarely changes, cache aggressively
    "cdn.example.com": {"type": "CNAME", "value": "d123.cloudfront.net", "ttl": 86400},

    # Load-balanced service — moderate caching
    "api.example.com": {"type": "A", "value": "10.0.1.1", "ttl": 300},

    # Failover — short TTL for quick switching
    "db.example.com": {"type": "A", "value": "10.0.2.1", "ttl": 60},
}

DNS-Based Load Balancing

DNS can distribute traffic across multiple servers by returning different IP addresses for the same domain name.

Round-robin DNS: The authoritative server returns multiple A records, cycling through them. Simple but provides no health checking — if one server dies, DNS keeps sending traffic to it until the TTL expires.

Weighted DNS: Return different IPs with different frequencies. Send 80% of traffic to the primary data center and 20% to the secondary.

Geo DNS (latency-based routing): Return different IPs based on the resolver's location. Users in Europe get the European server's IP; users in Asia get the Asian server's IP. AWS Route 53, Cloudflare, and Google Cloud DNS all support this.

Health-checked DNS: Monitor server health and only return IPs for healthy servers. AWS Route 53 health checks can remove unhealthy endpoints within 30 seconds.

1
2
3
4
5
6
7
8
9
10
11
User in Tokyo          User in London
     │                       │
     ▼                       ▼
   DNS Resolver            DNS Resolver
     │                       │
     ▼                       ▼
  Geo DNS:                Geo DNS:
  "Tokyo? → 10.0.1.1"    "London? → 10.0.2.1"
     │                       │
     ▼                       ▼
  Tokyo Server            London Server

DNS Failure and Resilience

DNS is a critical dependency. If DNS fails, nothing works — browsers cannot even connect. Resilience strategies:

  • Multiple nameservers: Always have at least 2 NS records with different providers if possible.
  • Low TTL for failover records: So you can switch traffic quickly when a server dies.
  • Client-side caching: Applications can cache DNS results beyond TTL as a fallback.
  • DNS over HTTPS (DoH): Prevents DNS spoofing and censorship, though it bypasses corporate DNS policies.

CDNs: Content at the Edge

A Content Delivery Network is a globally distributed network of servers (edge locations or Points of Presence / PoPs) that cache content close to end users. Instead of every request traveling to your origin server (potentially thousands of miles away), most requests are served from a nearby edge node.

CDN Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
                    ┌──────────────┐
                    │ Origin Server│
                    │ (your server)│
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
        ┌─────┴──┐   ┌────┴───┐  ┌────┴───┐
        │  Edge  │   │  Edge  │  │  Edge  │
        │ Tokyo  │   │ London │  │ NYC    │
        └────┬───┘   └────┬───┘  └────┬───┘
             │            │           │
        Users in     Users in    Users in
        Asia         Europe      Americas

How CDN Caching Works

  1. First request (cache miss): User in Tokyo requests an image. The Tokyo edge does not have it, so it fetches from the origin server, caches the response, and serves it to the user.
  1. Subsequent requests (cache hit): Another user in Tokyo requests the same image. The Tokyo edge serves it directly from cache. No origin request needed.
  1. Cache expiry: After the cache TTL expires, the next request triggers a revalidation with the origin (`If-Modified-Since` or `If-None-Match`). If content has not changed, the origin returns 304 Not Modified, and the edge extends the cache.

Cache-Control Headers

HTTP cache-control headers tell CDNs (and browsers) how to cache responses:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Cache-Control: public, max-age=31536000
  "Anyone can cache this for 1 year" — perfect for versioned assets (app.a1b2c3.js)

Cache-Control: private, max-age=0, no-cache
  "Only the browser can cache, and must revalidate every time" — user-specific pages

Cache-Control: no-store
  "Never cache this anywhere" — sensitive data, authentication responses

Cache-Control: public, s-maxage=3600, max-age=60
  "CDN caches for 1 hour, browser caches for 1 minute" — different TTLs per layer

Cache-Control: stale-while-revalidate=60
  "Serve stale content for up to 60s while fetching fresh content in background"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Setting cache headers in a web application
from flask import Flask, send_file, make_response

app = Flask(__name__)

@app.route("/static/<path:filename>")
def serve_static(filename):
    response = make_response(send_file(f"static/{filename}"))
    # Immutable, versioned assets — cache forever
    response.headers["Cache-Control"] = "public, max-age=31536000, immutable"
    return response

@app.route("/api/user/profile")
def user_profile():
    data = get_user_profile()
    response = make_response(jsonify(data))
    # User-specific data — do not cache on CDN
    response.headers["Cache-Control"] = "private, no-cache"
    return response

@app.route("/api/products")
def products():
    data = get_products()
    response = make_response(jsonify(data))
    # Public data, cache on CDN for 5 min, browser for 1 min
    response.headers["Cache-Control"] = "public, s-maxage=300, max-age=60"
    response.headers["ETag"] = compute_etag(data)
    return response

CDN Cache Invalidation

The hardest problem with CDN caching is invalidation — how do you update cached content?

Time-based expiry (TTL): Set a TTL and wait. Simple, but users see stale content until expiry.

Versioned URLs: Include a hash in the URL (`app.a1b2c3.js`). New deploys create new URLs, so old cached content is irrelevant. This is the best practice for static assets.

Purge API: Most CDNs provide an API to invalidate specific URLs or patterns. CloudFront invalidations propagate globally in 5-15 minutes.

Stale-while-revalidate: Serve stale content immediately while fetching fresh content in the background. The user gets a fast (stale) response, and the next user gets fresh content.

Types of Content CDNs Serve

  • Static assets: JavaScript, CSS, images, fonts, videos. This is the bread and butter of CDNs.
  • Dynamic content: API responses can be cached at the edge with short TTLs. This reduces origin load for popular endpoints.
  • Whole page caching: Static site generators + CDN can serve entire websites without an origin server.
  • Video streaming: CDNs handle chunked video delivery (HLS/DASH segments), distributing the bandwidth-intensive work across edge nodes.

Edge Computing

Modern CDNs go beyond caching — they run code at the edge:

  • Cloudflare Workers: Run JavaScript/WASM at 300+ edge locations. Process requests, transform responses, implement A/B testing, all at the edge.
  • AWS Lambda@Edge / CloudFront Functions: Run lightweight functions during CloudFront request/response processing.
  • Vercel Edge Functions: Deploy serverless functions to edge locations for low-latency API endpoints.

Edge computing is particularly powerful for:

  • Request routing and A/B testing
  • Authentication and authorization
  • Image optimization and resizing
  • Personalization (geo-based content, language detection)
  • Bot detection and security filtering

DNS + CDN: Working Together

In a typical setup, DNS and CDN work hand-in-hand:

1
2
3
4
5
6
1. User requests www.example.com
2. DNS resolves to CDN (CNAME → d123.cloudfront.net)
3. CDN's DNS uses geo-routing to pick the nearest edge
4. Edge checks its cache
5. Cache hit → serve directly (fast!)
6. Cache miss → fetch from origin, cache, serve
1
2
3
4
5
6
7
8
9
DNS config:
  www.example.com  CNAME  d123.cloudfront.net   TTL=3600
  api.example.com  A      10.0.1.1              TTL=300

CDN config:
  Origin: origin.example.com:443
  Cache behavior for /static/*: max-age=31536000
  Cache behavior for /api/*: max-age=60
  Default: no-cache (pass through to origin)

Performance Impact

The performance difference between serving from origin vs CDN is dramatic:

1
2
3
4
5
6
7
8
9
10
11
12
Without CDN (origin in US-East):
  User in Tokyo → US-East → Tokyo: ~200ms round trip
  Each asset: 200ms × number of assets

With CDN (edge in Tokyo):
  User in Tokyo → Tokyo edge → User: ~10ms round trip
  First request: 200ms (cache miss, fetch from origin)
  Subsequent requests: ~10ms (cache hit)

For a page with 50 assets:
  Without CDN: 50 × 200ms = 10 seconds (sequential)
  With CDN: 50 × 10ms = 500ms (sequential, after warm cache)

Interview Tips

  1. Mention DNS early in any design. "The first thing that happens when a user visits our service is a DNS lookup. We use geo-based DNS routing to direct users to the nearest data center." This shows you think about the full request path.
  1. Know the resolution chain. Being able to explain browser cache, OS cache, recursive resolver, root, TLD, and authoritative nameserver demonstrates depth.
  1. Use CDNs for static content. "All static assets are served from a CDN with immutable cache headers and versioned URLs. This gives us sub-50ms load times globally and offloads 80%+ of our bandwidth from the origin."
  1. Discuss cache invalidation. "We use versioned URLs for JavaScript and CSS bundles, so deploys are instant. For API responses cached at the edge, we use short TTLs (60 seconds) with stale-while-revalidate."
  1. Connect DNS to availability. "We configure health-checked DNS with Route 53. If our primary data center goes down, DNS automatically routes to the secondary within 60 seconds."
  1. Quantify the impact. "Moving our assets to a CDN reduced P50 page load time from 2.5 seconds to 800ms for international users." Concrete numbers make your answers memorable.