Advanced Load Balancing: Scaling and Routing
Scaling Load Balancers: Eliminating Single Point of Failure 🎯
The Bottleneck Problem
Even at 100K req/s, load balancers have limits:
- Internet-scale services (Google) require 10M+ req/s
- Single load balancer = single point of failure
- Load balancer crash = complete service outage
The Failed Approach: Hierarchical Load Balancing
Proposal: Load balancer in front of load balancers?
[Users] → [Meta-LB] → [Load Balancers] → [Backend Servers]Problem: Meta-LB becomes new bottleneck and SPOF. Just moves the problem up one level.
Conclusion: Cannot solve SPOF with hierarchical redundancy.
The Solution: Parallel Load Balancers + DNS 🌐
Architecture: Multiple load balancers in parallel (not hierarchical).
DNS configuration: Register multiple IP addresses for single domain.
Example:
delicious.com → [10.0.0.1, 10.0.0.2, 10.0.0.3, 10.0.0.4]DNS Routing Strategies
Option 1: Return multiple IPs
- Client receives list of load balancer IPs
- Client randomly selects one
- Simple distribution mechanism
Option 2: GeoDNS (geographic routing)
- DNS detects user's geolocation via IP address
- Returns IP of geographically closest load balancer
- Minimizes latency by reducing physical distance
- Benefit: Lower latency, better user experience
Load Balancer Failure Handling 💀
Question: When load balancer fails, update DNS?
Answer: No. DNS propagation is too slow (hours to days).
Strategy:
- Quickly restart failed load balancer, OR
- Replace with new load balancer using same IP address
Client-side behavior:
- Request to failed LB times out
- Client automatically retries with different LB IP from list
- Minimal service disruption
Key insight: Don't update DNS for real-time failures. DNS caching makes propagation too slow for failure recovery.
Benefits of Multi-Load Balancer Architecture ✅
Advantages:
- ✅ No single point of failure (one fails → others continue)
- ✅ Horizontal scalability (add LBs as needed)
- ✅ Geographic distribution (place LBs near users)
- ✅ Maintenance flexibility (take one down for updates)
Architecture:
[Global Users]
↓
[GeoDNS] (returns closest LB)
↓
[LB-1] [LB-2] [LB-3] ... [LB-N]
↓
[Backend Server Pool: 500 servers]The Response Path Question 🤔
Scenario:
- Client (Prem) sends request to load balancer
- Load balancer forwards to Backend Server #237
- Server processes and generates response
Question: Does response go directly to client, or back through load balancer?
Answer: Response returns through load balancer (same path, reversed).
Reverse Proxy vs Router 🔀
Load Balancers as Reverse Proxies
How reverse proxies work:
- Request termination: Load balancer receives and terminates client connection
- New request creation: Load balancer creates new request to backend server
- Server perspective: Backend thinks load balancer is the client
- Response handling: Server responds to load balancer
- Client delivery: Load balancer forwards response to original client
Router vs Reverse Proxy Comparison
Router:
- Forwards packets without termination
- No connection intermediation
- Transparent pass-through
- Like mail forwarding without opening envelopes
Reverse Proxy (Load Balancer):
- Terminates incoming connections
- Creates new outbound connections
- Acts as middleman/intermediary
- Like assistant who receives messages, rewrites them, sends on your behalf
Why responses must return through load balancer: Backend server only knows about load balancer, not original client. Return path must be symmetric.
The Random Routing Problem 😱
Data Consistency Challenge
Scenario:
- Day 1: User adds bookmark → routed to Server #42 → data saved on Server #42
- Day 2: User views bookmarks → routed to Server #189 → Server #189 has no user data
Result: User cannot access their own data.
Problems with Random Routing
1. Data fragmentation
- User A's data on Server 1
- User B's data on Server 5
- User C's data on Server 23
- Random routing prevents users from finding their data
2. Inconsistent state
- Update profile on Server 10
- Next request routes to Server 50
- Server 50 has stale data
- System appears broken
3. Database architecture questions
- Do all servers share one database?
- Does each server have separate database?
- How is data synchronized?
Conclusion: Random routing breaks data locality. Intelligent routing required.
Critical Unsolved Problems 🔴
Problem #1: Data Distribution (Sharding) 📊
Question: How to split data across 500 servers?
Strategies to explore:
- Alphabetical (A-M on Servers 1-250, N-Z on Servers 251-500)?
- Geographic distribution?
- User ID-based partitioning?
- Other approaches?
Problem #2: Intelligent Routing 🧭
Question: How does load balancer know which server contains which user's data?
Approaches to explore:
- Hash-based routing (user ID hashing)
- Round-robin
- Least connections
- Session affinity/sticky sessions
- Consistent hashing
Load Balancer Failure During Request 💀
Scenario:
- Client sends request
- Load balancer forwards to backend
- Load balancer crashes before response
- Backend processes and generates response
- Response has nowhere to go (LB dead)
Result:
- Client experiences request timeout
- Client automatically retries with different load balancer
- Eventually succeeds
Impact: Perceived as slow request, not catastrophic failure.
Why acceptable:
- Load balancer failures are rare
- Multiple load balancers provide redundancy
- Client retry logic handles transient failures
Load Balancer Request Routing 💻
Pseudo-code Implementation
Basic request handling flow:
const userID = request.userID; // Load balancer receives request
const serverID = consistentHashing.getServerForUser(userID); // Determine target server using routing algorithm
const response = await makeRequestTo(serverID, request); // Forward request to selected server
return response; // Return response to original callerRequest lifecycle:
- Load balancer receives client request
- Extract user/request metadata
- Apply routing algorithm (consistent hashing, round-robin, etc.)
- Forward to selected backend server
- Wait for server response (request thread remains open)
- Return response to original client
Concurrency Model
Handling parallel requests:
- Multiple simultaneous requests = multiple function instances
- Each request has dedicated thread
- Each maintains independent context
- Program counter tracks execution state
- Response returns to exact caller via thread context
SSL/TLS Termination 🔒
Network Architecture Layers
Untrusted network (public internet):
- Client connections require encryption
- SSL/TLS handshake occurs
- Load balancer terminates SSL connection
Trusted VPC (private network):
- Behind load balancer
- Internal server communication
- Can use unencrypted connections (performance optimization)
SSL termination point: Load balancer acts as security boundary.
Benefit: Internal traffic optimization while maintaining external security.
DNS Configuration for Load Balancers 🌐
Single Load Balancer Routing
Requirement: Route all traffic to specific load balancer (LB1)
Solution: Configure single A record in DNS
Example:
maya.com → 10.0.0.1 (LB1 IP address)Result: All client requests automatically route to specified load balancer.
DNS vs Client Requests
Configuration phase:
- Domain owner configures DNS through registrar dashboard
- Sets A records, load balancer IPs, routing rules
Request phase:
- Client queries DNS for IP address
- DNS returns IP
- Client connects directly to IP (does not contact registrar)
Key distinction: Registrar used for configuration only. Clients never interact with registrar during normal operation.
Key Takeaways 💡
- Multiple load balancers eliminate single point of failure.
- Hierarchical load balancing doesn't solve SPOF.
- DNS is too slow for real-time failure recovery.
- Reverse proxies enable security boundaries.
- Random routing breaks data locality.
- Intelligent routing requires data locality awareness.
- Load balancer failures are tolerable with redundancy.
- Request routing involves metadata extraction.