Advanced Load Balancing: Scaling and Routing

Scaling Load Balancers: Eliminating Single Point of Failure 🎯

The Bottleneck Problem

Even at 100K req/s, load balancers have limits:

Internet-scale services (Google) require 10M+ req/s
Single load balancer = single point of failure
Load balancer crash = complete service outage

The Failed Approach: Hierarchical Load Balancing

Proposal: Load balancer in front of load balancers?

[Users] → [Meta-LB] → [Load Balancers] → [Backend Servers]

Problem: Meta-LB becomes new bottleneck and SPOF. Just moves the problem up one level.

Conclusion: Cannot solve SPOF with hierarchical redundancy.

The Solution: Parallel Load Balancers + DNS 🌐

Architecture: Multiple load balancers in parallel (not hierarchical).

DNS configuration: Register multiple IP addresses for single domain.

Example:

delicious.com → [10.0.0.1, 10.0.0.2, 10.0.0.3, 10.0.0.4]

DNS Routing Strategies

Option 1: Return multiple IPs

Client receives list of load balancer IPs
Client randomly selects one
Simple distribution mechanism

Option 2: GeoDNS (geographic routing)

DNS detects user's geolocation via IP address
Returns IP of geographically closest load balancer
Minimizes latency by reducing physical distance
Benefit: Lower latency, better user experience

Load Balancer Failure Handling 💀

Question: When load balancer fails, update DNS?

Answer: No. DNS propagation is too slow (hours to days).

Strategy:

Quickly restart failed load balancer, OR
Replace with new load balancer using same IP address

Client-side behavior:

Request to failed LB times out
Client automatically retries with different LB IP from list
Minimal service disruption

Key insight: Don't update DNS for real-time failures. DNS caching makes propagation too slow for failure recovery.

Benefits of Multi-Load Balancer Architecture ✅

Advantages:

✅ No single point of failure (one fails → others continue)
✅ Horizontal scalability (add LBs as needed)
✅ Geographic distribution (place LBs near users)
✅ Maintenance flexibility (take one down for updates)

Architecture:

[Global Users]
       ↓
[GeoDNS] (returns closest LB)
       ↓
[LB-1]  [LB-2]  [LB-3]  ... [LB-N]
       ↓
[Backend Server Pool: 500 servers]

The Response Path Question 🤔

Scenario:

Client (Prem) sends request to load balancer
Load balancer forwards to Backend Server #237
Server processes and generates response

Question: Does response go directly to client, or back through load balancer?

Answer: Response returns through load balancer (same path, reversed).

Reverse Proxy vs Router 🔀

Load Balancers as Reverse Proxies

How reverse proxies work:

Request termination: Load balancer receives and terminates client connection
New request creation: Load balancer creates new request to backend server
Server perspective: Backend thinks load balancer is the client
Response handling: Server responds to load balancer
Client delivery: Load balancer forwards response to original client

⚠️ Key insight: Client and server never communicate directly. Load balancer mediates both directions.

Router vs Reverse Proxy Comparison

Router:

Forwards packets without termination
No connection intermediation
Transparent pass-through
Like mail forwarding without opening envelopes

Reverse Proxy (Load Balancer):

Terminates incoming connections
Creates new outbound connections
Acts as middleman/intermediary
Like assistant who receives messages, rewrites them, sends on your behalf

Why responses must return through load balancer: Backend server only knows about load balancer, not original client. Return path must be symmetric.

The Random Routing Problem 😱

Data Consistency Challenge

Scenario:

Day 1: User adds bookmark → routed to Server #42 → data saved on Server #42
Day 2: User views bookmarks → routed to Server #189 → Server #189 has no user data

Result: User cannot access their own data.

Problems with Random Routing

1. Data fragmentation

User A's data on Server 1
User B's data on Server 5
User C's data on Server 23
Random routing prevents users from finding their data

2. Inconsistent state

Update profile on Server 10
Next request routes to Server 50
Server 50 has stale data
System appears broken

3. Database architecture questions

Do all servers share one database?
Does each server have separate database?
How is data synchronized?

Conclusion: Random routing breaks data locality. Intelligent routing required.

Critical Unsolved Problems 🔴

Problem #1: Data Distribution (Sharding) 📊

Question: How to split data across 500 servers?

Strategies to explore:

Alphabetical (A-M on Servers 1-250, N-Z on Servers 251-500)?
Geographic distribution?
User ID-based partitioning?
Other approaches?

Problem #2: Intelligent Routing 🧭

Question: How does load balancer know which server contains which user's data?

Approaches to explore:

Hash-based routing (user ID hashing)
Round-robin
Least connections
Session affinity/sticky sessions
Consistent hashing

Load Balancer Failure During Request 💀

Scenario:

Client sends request
Load balancer forwards to backend
Load balancer crashes before response
Backend processes and generates response
Response has nowhere to go (LB dead)

Result:

Client experiences request timeout
Client automatically retries with different load balancer
Eventually succeeds

Impact: Perceived as slow request, not catastrophic failure.

Why acceptable:

Load balancer failures are rare
Multiple load balancers provide redundancy
Client retry logic handles transient failures

Load Balancer Request Routing 💻

Pseudo-code Implementation

Basic request handling flow:

const userID = request.userID; // Load balancer receives request
const serverID = consistentHashing.getServerForUser(userID); // Determine target server using routing algorithm
const response = await makeRequestTo(serverID, request); // Forward request to selected server
return response; // Return response to original caller

Request lifecycle:

Load balancer receives client request
Extract user/request metadata
Apply routing algorithm (consistent hashing, round-robin, etc.)
Forward to selected backend server
Wait for server response (request thread remains open)
Return response to original client

Concurrency Model

Handling parallel requests:

Multiple simultaneous requests = multiple function instances
Each request has dedicated thread
Each maintains independent context
Program counter tracks execution state
Response returns to exact caller via thread context

✅ Result: Load balancer handles millions of concurrent requests through thread-level parallelism.

SSL/TLS Termination 🔒

Network Architecture Layers

Untrusted network (public internet):

Client connections require encryption
SSL/TLS handshake occurs
Load balancer terminates SSL connection

Trusted VPC (private network):

Behind load balancer
Internal server communication
Can use unencrypted connections (performance optimization)

SSL termination point: Load balancer acts as security boundary.

Benefit: Internal traffic optimization while maintaining external security.

DNS Configuration for Load Balancers 🌐

Single Load Balancer Routing

Requirement: Route all traffic to specific load balancer (LB1)

Solution: Configure single A record in DNS

Example:

maya.com → 10.0.0.1 (LB1 IP address)

Result: All client requests automatically route to specified load balancer.

DNS vs Client Requests

Configuration phase:

Domain owner configures DNS through registrar dashboard
Sets A records, load balancer IPs, routing rules

Request phase:

Client queries DNS for IP address
DNS returns IP
Client connects directly to IP (does not contact registrar)

Key distinction: Registrar used for configuration only. Clients never interact with registrar during normal operation.

Key Takeaways 💡

Multiple load balancers eliminate single point of failure.
Hierarchical load balancing doesn't solve SPOF.
DNS is too slow for real-time failure recovery.
Reverse proxies enable security boundaries.
Random routing breaks data locality.
Intelligent routing requires data locality awareness.
Load balancer failures are tolerable with redundancy.
Request routing involves metadata extraction.