Load Balancing for Distributed Caches: Consistent Hashing vs. Round-Robin
The Fundamental Question ⚖️
When you have multiple cache servers (either local distributed or global distributed), what routing algorithm should the load balancer use?
Answer: It depends on WHY you distributed the cache.
Reason 1: Data is Too Large (Sharding)
Problem: Data exceeds single server capacity
Solution: Shard data across cache servers
Characteristic: Different cache servers have different data
Routing algorithm: Consistent hashing
Why Consistent Hashing?
To fetch user X's data, you must route to the specific server holding that data. Random routing would cause cache misses.
Example:
- Server 1: Users A-F
- Server 2: Users G-M
- Server 3: Users N-Z
Request for User H must go to Server 2 (consistent hashing ensures this).
Reason 2: Request Volume is Too High (Replication)
Problem: Single server cannot handle request volume
Solution: Replicate data across cache servers
Characteristic: Different cache servers have the same data
Routing algorithm: Round-robin
Why Round-Robin?
Any server can handle any request since they all have identical data. Distribute requests evenly for load balancing.
Example:
- Server 1: All product catalog data
- Server 2: All product catalog data
- Server 3: All product catalog data
Request for Product 123 can go to any server (round-robin distributes load).
The Salespeople Analogy 📞
Scenario A: Too Many Customers (Data Sharding)
You have 10 salespeople because you have 1,000 customers:
- Salesperson 1: Customers 1-100
- Salesperson 2: Customers 101-200
- ...and so on
Routing: To get Customer #150's phone number, you must go to Salesperson 2 → Use consistent hashing
Scenario B: Too Many Requests (Data Replication)
You have 10 customers, but 100,000 requests per day:
- All 10 salespeople have phone books with the same 10 customers
- Any salesperson can handle any request
Routing: Distribute requests randomly to available salespeople → Use round-robin
Real Example: Scaler's Code Judge Service 💻
System: Online code evaluation platform
Problem: Students submit code for evaluation against test cases. A single server cannot handle the request volume.
Architecture:
- Multiple app servers (round-robin load balancing)
- Each server caches test case data locally
- All servers have identical cached data (same test cases)
Flow:
- Student submits code
- Load balancer routes request to random app server (round-robin)
- App server evaluates code using locally cached test cases
- Returns results
Why this works: Test case data is static and identical across all servers. High request volume requires multiple servers, but data size is manageable for local caching.
Load Balancer's Role in Distributed Caches 🔀
For Global Distributed Cache
- App servers see the cache as a single logical unit
- Load balancer abstracts away the fact that there are multiple Redis servers
- App servers don't care which physical cache server handles the request
For Local Distributed Cache
- Users don't care which app server handles their request
- Load balancer routes requests to appropriate app servers
- Each app server then uses its own local cache
Decision Framework 🎯
Ask yourself: WHY did we distribute the cache?
If Data Was Too Large:
WHY: Data doesn't fit on one server
HOW: Shard data across servers
RESULT: Different servers have different data
ROUTING: Consistent hashingIf Requests Were Too High:
WHY: One server can't handle request volume
HOW: Replicate data across servers
RESULT: Different servers have same data
ROUTING: Round-robinExample: Twitter's Elon Musk Profile
Problem: Elon Musk's Twitter profile is viewed 100 million times per day.
Naive solution: Compute the profile once, cache it in a single Redis server.
Bottleneck: A single cache server cannot handle 100 million requests per day.
Optimized solution:
- Cache Elon Musk's profile in multiple Redis servers with identical data
- Use round-robin to distribute requests across all cache servers
- Each server serves the same cached profile, but load is distributed
Local Cache with Round-Robin 🔄
What happens if you use round-robin with local caches?
Requests are randomly distributed across app servers. Each server caches data it encounters. Eventually, all servers end up with similar cached data.
Use case: High request volume where a single server cannot handle the load, but data size is manageable. Multiple servers process the same type of requests using identical cached data.
Next: Cache eviction policies — LRU, FIFO, and why LRU wins 99% of the time.