Load Balancing Architecture and Service Discovery

March 5, 20264 min read
system designhigh level designHLDdistributed systemsscalabilitymicroservicesload balancingcachingdatabase designAPI designsoftware architecture

Load Balancing Architecture ⚖️

The IP Address Problem

Single server scenario:

  • DNS maps domain → single IP
  • User connects directly to server

500 server scenario:

  • Each server has unique IP address
  • Which IP does DNS return?
  • If DNS returns one IP, that server gets all traffic (defeats purpose of horizontal scaling)

Load Balancer Solution 🎯

Architecture:

[Client] → [Load Balancer] → [Backend Server Pool (499 servers)]

Load balancer implementation:

  1. Designate one machine from pool as load balancer
  2. Install load balancing software (Nginx, Kong, HAProxy)
  3. All client traffic routes through load balancer
  4. Load balancer distributes requests to backend servers

DNS configuration:

  • DNS maps domain → load balancer IP only
  • Backend server IPs remain internal/hidden
  • Clients only know load balancer's address

Load Balancer Responsibilities 📜

1. Abstraction (Unified View)

  • Present single system interface to clients
  • Hide distributed architecture complexity
  • Clients unaware of multiple backend servers

2. Load Distribution

  • Distribute requests evenly across servers
  • Prevent individual server overload
  • Maintain balanced utilization across pool

Key distinction: Router forwards traffic blindly. Load balancer intelligently distributes based on server capacity and load.


Service Discovery: Tracking Available Servers 🕵️

The Discovery Problem

Load balancer must know:

  1. ✅ Which servers exist
  2. ✅ Which servers are healthy (operational)
  3. ✅ Which servers are failed/crashed

Challenge: Servers can fail at any time. Load balancer requires real-time awareness of backend pool health.

Solution #1: Heartbeat Mechanism (Push) 💓

Concept: Servers actively report their status to load balancer.

Implementation:

  1. Load balancer exposes endpoint: /heartbeat
  2. Each backend server periodically sends status (e.g., every 5 seconds):
POST /heartbeat { "server_ip": "192.168.1.5", "status": "alive" }
  1. Load balancer maintains list of active servers
  2. If server misses multiple consecutive heartbeats → marked as failed
  3. Failed servers removed from routing pool

Advantages:

  • ✅ Real-time health awareness
  • ✅ Failed servers automatically removed
  • ✅ No traffic routed to dead servers

Pattern: Push mechanism (servers push status to load balancer)

Solution #2: Health Check Mechanism (Pull) 🩺

Concept: Load balancer actively queries server health (pull approach).

Implementation:

  1. Load balancer periodically polls each server
  2. Servers expose health check endpoint: /health
GET http://192.168.1.5/health
  1. Server responds: 200 OK (healthy)
  2. Timeout or error after multiple attempts → server marked as failed

Pattern: Pull mechanism (load balancer pulls status from servers)

Push vs Pull: Service Discovery Approaches 🤔

Heartbeat (Push):

  • Servers actively report status to load balancer
  • Real-time notification when server starts/fails
  • Distributed responsibility across all servers

Health Check (Pull):

  • Load balancer queries server status
  • Centralized responsibility in load balancer
  • Slightly delayed failure detection (polling interval)

Industry preference: Health checks more common due to centralized maintenance and simpler server implementation.

✅ Conclusion: Both equally efficient choice depends on where you want monitoring responsibility to reside.

New Server Registration 🆕

Problem: How does load balancer discover newly added servers?

Solution: New server self-registers with load balancer:

POST http://load-balancer/register { "server_ip": "192.168.1.250", "status": "ready" }

Principle: Registration is server's responsibility. Load balancer cannot autonomously detect new infrastructure servers must announce their presence.


Load Balancer Performance: The 100× Advantage 💪

Application Server Workload

Responsibilities:

  1. Receive request
  2. Process through OSI layers (7 layers)
  3. Deserialize request
  4. Decrypt (if encrypted)
  5. Authorization checks
  6. Database queries
  7. Business logic execution
  8. Generate response
  9. Serialize response
  10. Encrypt response
  11. Send through OSI layers

Throughput: 100-1,000 requests/second

Load Balancer Workload

Responsibilities:

  1. Examine incoming request
  2. Select backend server
  3. Forward request

What load balancers DON'T do:

  1. ❌ Deserialization
  2. ❌ Decryption
  3. ❌ Authentication
  4. ❌ Database access
  5. ❌ Business logic
  6. ❌ Response generation

Throughput: 100,000+ requests/second

Performance ratio: Load balancers handle 100× more traffic than application servers due to minimal processing overhead.

OSI Model Context 🌐

7 Layers (bottom to top):

  1. Physical: Electrical signals, hardware
  2. Data Link: MAC addresses, frames
  3. Network: IP addresses, routing
  4. Transport: TCP/UDP, ports
  5. Session: Connection management
  6. Presentation: Encryption, data formatting
  7. Application: HTTP, web services

Load balancers operate primarily at Network/Transport layers (lower overhead than Application layer processing).

Load Balancer OSI Layers

Layer 4 (Transport Layer):

  • TCP/UDP routing
  • Works with IP addresses and ports
  • No application protocol awareness

Layer 7 (Application Layer):

  • HTTP/HTTPS routing
  • Application protocol awareness
  • Can route based on URL paths, headers, cookies

Use case determines layer: Different operational layers for different requirements.


Key Takeaways 💡

  1. Load balancers are essential for distributed architectures.
  2. Service discovery is critical.
  3. Push vs pull are equally efficient.
  4. New servers must self-register.
  5. Load balancers handle 100× more traffic than application servers.
  6. OSI layer choice matters.