System Design & Computer Networks 101 — Part 2: DNS and Domain Name Resolution

January 17, 2026Durgesh Rai
DNSdomain namesystem designnetworkingscalability

📘 System Design & Computer Networks 101

This post is part of a beginner-friendly series that builds strong fundamentals in system design and computer networks from MVP thinking to large-scale distributed systems.

You are reading: Part 2 of 7

📚 Series Roadmap

This series will progress step by step:

  1. High-Level Design & MVP Thinking

  2. DNS and Domain Name Resolution ← you are here

  3. Server Infrastructure and Database Design

  4. Vertical vs Horizontal Scaling

  5. Load Balancing and Service Discovery

  6. Advanced Load Balancing and Routing

  7. Domain Registration, DNS Management, and HLD Philosophy


The Core Problem 🎯

When building distributed systems, we face a fundamental challenge: machines communicate via IP addresses, but humans work with domain names. Understanding how this translation happens at scale reveals critical system design principles about bottlenecks, caching, and fault tolerance.

IP Addresses: The Foundation

Every device connected to the internet has an IP address. Direct IP-based communication is possible:

http://142.250.185.46 → Google’s server

Key insight: Domain names are abstraction layers. The underlying internet operates entirely on IP addresses.

Security consideration: Direct IP access can be blocked. Services like CloudFront reject direct IP connections as a security measure, forcing clients through proper domain resolution paths.


The Human-Machine Interface Gap 🧠

IP addresses are machine-readable but not human-friendly. Consider:

  1. Can you remember IPs for 50+ frequently visited sites? No.

  2. Do IPs change when infrastructure updates? Yes.

  3. Are new domains created constantly? Yes.

This necessitates a mapping system: domain name → IP address

DNS: Domain Name System 📡

DNS functions as a distributed directory service. When you request example.com:

  1. Browser needs the IP address

  2. Browser queries DNS infrastructure

  3. DNS returns the mapped IP

  4. Browser connects to that IP

  5. Server responds with content

ICANN: The Central Authority 👑

ICANN (Internet Corporation for Assigned Names and Numbers) is the authoritative source for domain-to-IP mappings globally.

Domain registration flow:

  1. Purchase domain through registrar (GoDaddy, Cloudflare, etc.)

  2. Registrar submits mapping to ICANN

  3. ICANN updates authoritative records

  4. Domain becomes resolvable

Note: Registrars are brokers. ICANN is the source of truth.

Domain ownership: First-come, first-served. Once owned, domains are tradable assets (Premium domains sell for millions).


The Architectural Problem: Scale and Fragility ⚠️

Consider the naive approach - all DNS queries hit ICANN directly:

Scale:

  1. 5+ billion internet users

  2. 100+ billion connected devices

  3. Every web request requires domain resolution

Problem #1 - Bottleneck 🍾

ICANN servers become a choke point. Billions of concurrent requests for IP resolution would overwhelm any centralized system, causing severe latency and throughput degradation.

Problem #2 - Single Point of Failure 💥

If ICANN’s infrastructure fails, global DNS resolution stops. No domain names resolve. The internet effectively goes down. This is architecturally unacceptable for a system requiring five nines (99.999%) availability.

The Central Design Challenge 🤔

We need:

  • ✅ Centralized authority for domain ownership (ICANN)

  • ❌ Cannot have all queries hitting central servers

How do we resolve this contradiction?

The solution involves distributed caching, hierarchical DNS architecture, and TTL-based invalidation strategies.


The Solution: Hierarchical DNS Architecture 🧅

Rather than direct ICANN queries, DNS uses a multi-tier architecture:

Architecture layers:

  1. ICANN: Authoritative source (top of hierarchy)

  2. Root DNS Servers: 7 primary servers maintaining complete ICANN database replicas

  3. Lower-tier DNS Servers: Hundreds of thousands of distributed servers worldwide

Key principle: Clients query distributed DNS servers, not ICANN directly.

Eliminating Single Point of Failure 💪

Scenario 1: ICANN outage

  1. Impact: None on resolution

  2. Reason: Distributed DNS servers have cached/replicated data

  3. Result: Internet continues functioning

Scenario 2: Multiple DNS server failures

  1. Impact: Minimal

  2. Reason: Hundreds of thousands of servers globally

  3. Result: Traffic routes to available servers

The distribution eliminates bottlenecks and single points of failure simultaneously.


DNS Caching: Performance at Every Layer ⚡

Caching occurs at multiple levels:

  1. ✅ Local machine (browser/OS cache)

  2. ✅ Router cache

  3. ✅ ISP DNS server cache

  4. ✅ Higher-tier DNS servers

  5. ✅ Root DNS servers

Impact: First query requires full DNS resolution. Subsequent queries hit cache effectively instantaneous.

TTL (Time To Live): Cache entries expire based on configured TTL, ensuring eventual consistency when IP mappings change.


DNS Server Maintenance: Who and Why 💰

Organizations maintaining DNS infrastructure:

1. Tech Giants (Google, Cloudflare)

  1. Internet downtime = revenue loss (millions per minute)

  2. Vested interest in stability and performance

  3. Operate public DNS servers (8.8.8.8, 1.1.1.1)

2. Governments

  1. National security concerns

  2. Economic stability requirements

  3. Communication infrastructure dependencies

3. ISPs (Internet Service Providers)

  1. Customer service quality (slow DNS = complaints)

  2. Control over user traffic routing

  3. Default DNS configuration for customers

ISP DNS Control and Override 🔧

Default behavior:

  1. ISPs automatically configure routers to use their DNS servers

  2. Users typically unaware of this configuration

  3. ISP controls resolution by default

Custom DNS configuration: Users can override ISP DNS by manually configuring:

  1. Google Public DNS: 8.8.8.8, 8.8.4.4

  2. Cloudflare DNS: 1.1.1.1

  3. Other public DNS providers

DNS-Based Censorship 😈

How ISPs Block Websites

Method: DNS Poisoning

Example: ISP blocks example.com

  1. ISP operates custom DNS server

  2. ISP’s DNS database: example.com → "Does not exist"

  3. User queries: “What’s the IP for example.com?”

  4. ISP DNS responds: “Unknown domain”

  5. User perspective: Website doesn’t exist

Workaround: Use public DNS (8.8.8.8) to bypass ISP censorship. Public DNS returns actual IP address.

Note: DNS poisoning is one method among several for website blocking. Other methods include IP-based blocking and deep packet inspection.

Performance Impact of DNS

Slow DNS = Slow Internet (First Load)

Resolution flow:

  1. User visits new domain

  2. Browser queries DNS (latency depends on DNS server response time)

  3. DNS returns IP

  4. Browser connects to actual server

If DNS response is slow (seconds vs milliseconds), every new domain feels slow. Cached domains remain fast, but first-load experience degrades.


Case Study: Delicious at Scale 🚀

Complete Infrastructure Flow

Setup:

  1. Joshua deploys web server on personal laptop

  2. Acquires internet connection (ISP → Router → Laptop)

  3. Purchases delicious.com from domain registrar

  4. Registrar submits mapping to ICANN

  5. DNS servers worldwide update: delicious.com → Joshua's laptop IP

User access flow:

  1. User types delicious.com in browser

  2. Browser queries DNS server

  3. DNS returns Joshua’s laptop IP address

  4. Browser establishes TCP connection to laptop

  5. Web server responds with content

  6. User sees Delicious homepage


The Viral Growth Problem ⚠️

Initial scale: 50-100 users (manageable on laptop)

Growth trajectory: Word spreads → millions of users

Critical constraint: Delicious runs on a personal laptop (not enterprise hardware)

2003 hardware context:

  • Consumer laptops: ~128 MB RAM (megabytes, not gigabytes)

  • Limited CPU

  • Limited storage

  • Single machine running 24/7

The architectural crisis:

  • Millions of daily requests

  • Exponential traffic growth

  • Single laptop as bottleneck

  • No redundancy, no failover

The fundamental problem: A single consumer laptop cannot handle viral-scale traffic. The architecture must evolve from single-server to distributed infrastructure.


Key Takeaways 💡

  1. Human abstractions hide machine complexity.

  2. Centralized authority ≠ centralized traffic.

  3. Scale turns correctness into an availability problem.

  4. Caching is the real hero of the internet.

  5. TTL is a tradeoff, not a bug.

  6. Single points of failure are architectural red flag.

  7. Infrastructure evolves after success, not before.

  8. First-load latency defines user perception.

  9. Control planes shape power.

  10. Always design for 100× growth even if you don’t need it yet.

This article was originally published on Substack.

Read on Substack