Advanced Load Balancing: Scaling and Routing

March 5, 20266 min read
system designhigh level designHLDdistributed systemsscalabilitymicroservicesload balancingcachingdatabase designAPI designsoftware architecture

Scaling Load Balancers: Eliminating Single Point of Failure 🎯

The Bottleneck Problem

Even at 100K req/s, load balancers have limits:

  • Internet-scale services (Google) require 10M+ req/s
  • Single load balancer = single point of failure
  • Load balancer crash = complete service outage

The Failed Approach: Hierarchical Load Balancing

Proposal: Load balancer in front of load balancers?

[Users] → [Meta-LB] → [Load Balancers] → [Backend Servers]

Problem: Meta-LB becomes new bottleneck and SPOF. Just moves the problem up one level.

Conclusion: Cannot solve SPOF with hierarchical redundancy.

The Solution: Parallel Load Balancers + DNS 🌐

Architecture: Multiple load balancers in parallel (not hierarchical).

DNS configuration: Register multiple IP addresses for single domain.

Example:

delicious.com → [10.0.0.1, 10.0.0.2, 10.0.0.3, 10.0.0.4]

DNS Routing Strategies

Option 1: Return multiple IPs

  • Client receives list of load balancer IPs
  • Client randomly selects one
  • Simple distribution mechanism

Option 2: GeoDNS (geographic routing)

  • DNS detects user's geolocation via IP address
  • Returns IP of geographically closest load balancer
  • Minimizes latency by reducing physical distance
  • Benefit: Lower latency, better user experience

Load Balancer Failure Handling 💀

Question: When load balancer fails, update DNS?

Answer: No. DNS propagation is too slow (hours to days).

Strategy:

  • Quickly restart failed load balancer, OR
  • Replace with new load balancer using same IP address

Client-side behavior:

  • Request to failed LB times out
  • Client automatically retries with different LB IP from list
  • Minimal service disruption

Key insight: Don't update DNS for real-time failures. DNS caching makes propagation too slow for failure recovery.

Benefits of Multi-Load Balancer Architecture ✅

Advantages:

  1. ✅ No single point of failure (one fails → others continue)
  2. ✅ Horizontal scalability (add LBs as needed)
  3. ✅ Geographic distribution (place LBs near users)
  4. ✅ Maintenance flexibility (take one down for updates)

Architecture:

[Global Users] [GeoDNS] (returns closest LB) [LB-1] [LB-2] [LB-3] ... [LB-N] [Backend Server Pool: 500 servers]

The Response Path Question 🤔

Scenario:

  1. Client (Prem) sends request to load balancer
  2. Load balancer forwards to Backend Server #237
  3. Server processes and generates response

Question: Does response go directly to client, or back through load balancer?

Answer: Response returns through load balancer (same path, reversed).

Reverse Proxy vs Router 🔀

Load Balancers as Reverse Proxies

How reverse proxies work:

  1. Request termination: Load balancer receives and terminates client connection
  2. New request creation: Load balancer creates new request to backend server
  3. Server perspective: Backend thinks load balancer is the client
  4. Response handling: Server responds to load balancer
  5. Client delivery: Load balancer forwards response to original client
⚠️ Key insight: Client and server never communicate directly. Load balancer mediates both directions.

Router vs Reverse Proxy Comparison

Router:

  • Forwards packets without termination
  • No connection intermediation
  • Transparent pass-through
  • Like mail forwarding without opening envelopes

Reverse Proxy (Load Balancer):

  • Terminates incoming connections
  • Creates new outbound connections
  • Acts as middleman/intermediary
  • Like assistant who receives messages, rewrites them, sends on your behalf

Why responses must return through load balancer: Backend server only knows about load balancer, not original client. Return path must be symmetric.


The Random Routing Problem 😱

Data Consistency Challenge

Scenario:

  • Day 1: User adds bookmark → routed to Server #42 → data saved on Server #42
  • Day 2: User views bookmarks → routed to Server #189 → Server #189 has no user data

Result: User cannot access their own data.

Problems with Random Routing

1. Data fragmentation

  • User A's data on Server 1
  • User B's data on Server 5
  • User C's data on Server 23
  • Random routing prevents users from finding their data

2. Inconsistent state

  • Update profile on Server 10
  • Next request routes to Server 50
  • Server 50 has stale data
  • System appears broken

3. Database architecture questions

  • Do all servers share one database?
  • Does each server have separate database?
  • How is data synchronized?

Conclusion: Random routing breaks data locality. Intelligent routing required.

Critical Unsolved Problems 🔴

Problem #1: Data Distribution (Sharding) 📊

Question: How to split data across 500 servers?

Strategies to explore:

  • Alphabetical (A-M on Servers 1-250, N-Z on Servers 251-500)?
  • Geographic distribution?
  • User ID-based partitioning?
  • Other approaches?

Problem #2: Intelligent Routing 🧭

Question: How does load balancer know which server contains which user's data?

Approaches to explore:

  1. Hash-based routing (user ID hashing)
  2. Round-robin
  3. Least connections
  4. Session affinity/sticky sessions
  5. Consistent hashing

Load Balancer Failure During Request 💀

Scenario:

  1. Client sends request
  2. Load balancer forwards to backend
  3. Load balancer crashes before response
  4. Backend processes and generates response
  5. Response has nowhere to go (LB dead)

Result:

  1. Client experiences request timeout
  2. Client automatically retries with different load balancer
  3. Eventually succeeds

Impact: Perceived as slow request, not catastrophic failure.

Why acceptable:

  1. Load balancer failures are rare
  2. Multiple load balancers provide redundancy
  3. Client retry logic handles transient failures

Load Balancer Request Routing 💻

Pseudo-code Implementation

Basic request handling flow:

const userID = request.userID; // Load balancer receives request const serverID = consistentHashing.getServerForUser(userID); // Determine target server using routing algorithm const response = await makeRequestTo(serverID, request); // Forward request to selected server return response; // Return response to original caller

Request lifecycle:

  1. Load balancer receives client request
  2. Extract user/request metadata
  3. Apply routing algorithm (consistent hashing, round-robin, etc.)
  4. Forward to selected backend server
  5. Wait for server response (request thread remains open)
  6. Return response to original client

Concurrency Model

Handling parallel requests:

  1. Multiple simultaneous requests = multiple function instances
  2. Each request has dedicated thread
  3. Each maintains independent context
  4. Program counter tracks execution state
  5. Response returns to exact caller via thread context
✅ Result: Load balancer handles millions of concurrent requests through thread-level parallelism.

SSL/TLS Termination 🔒

Network Architecture Layers

Untrusted network (public internet):

  • Client connections require encryption
  • SSL/TLS handshake occurs
  • Load balancer terminates SSL connection

Trusted VPC (private network):

  • Behind load balancer
  • Internal server communication
  • Can use unencrypted connections (performance optimization)

SSL termination point: Load balancer acts as security boundary.

Benefit: Internal traffic optimization while maintaining external security.


DNS Configuration for Load Balancers 🌐

Single Load Balancer Routing

Requirement: Route all traffic to specific load balancer (LB1)

Solution: Configure single A record in DNS

Example:

maya.com → 10.0.0.1 (LB1 IP address)

Result: All client requests automatically route to specified load balancer.

DNS vs Client Requests

Configuration phase:

  • Domain owner configures DNS through registrar dashboard
  • Sets A records, load balancer IPs, routing rules

Request phase:

  • Client queries DNS for IP address
  • DNS returns IP
  • Client connects directly to IP (does not contact registrar)

Key distinction: Registrar used for configuration only. Clients never interact with registrar during normal operation.


Key Takeaways 💡

  1. Multiple load balancers eliminate single point of failure.
  2. Hierarchical load balancing doesn't solve SPOF.
  3. DNS is too slow for real-time failure recovery.
  4. Reverse proxies enable security boundaries.
  5. Random routing breaks data locality.
  6. Intelligent routing requires data locality awareness.
  7. Load balancer failures are tolerable with redundancy.
  8. Request routing involves metadata extraction.