Load Balancing for Distributed Caches: Consistent Hashing vs. Round-Robin

The Fundamental Question ⚖️

When you have multiple cache servers (either local distributed or global distributed), what routing algorithm should the load balancer use?

Answer: It depends on WHY you distributed the cache.

Reason 1: Data is Too Large (Sharding)

Problem: Data exceeds single server capacity

Solution: Shard data across cache servers

Characteristic: Different cache servers have different data

Routing algorithm: Consistent hashing

Why Consistent Hashing?

To fetch user X's data, you must route to the specific server holding that data. Random routing would cause cache misses.

Example:

Server 1: Users A-F
Server 2: Users G-M
Server 3: Users N-Z

Request for User H must go to Server 2 (consistent hashing ensures this).

Reason 2: Request Volume is Too High (Replication)

Problem: Single server cannot handle request volume

Solution: Replicate data across cache servers

Characteristic: Different cache servers have the same data

Routing algorithm: Round-robin

Why Round-Robin?

Any server can handle any request since they all have identical data. Distribute requests evenly for load balancing.

Example:

Server 1: All product catalog data
Server 2: All product catalog data
Server 3: All product catalog data

Request for Product 123 can go to any server (round-robin distributes load).

The Salespeople Analogy 📞

Scenario A: Too Many Customers (Data Sharding)

You have 10 salespeople because you have 1,000 customers:

Salesperson 1: Customers 1-100
Salesperson 2: Customers 101-200
...and so on

Routing: To get Customer #150's phone number, you must go to Salesperson 2 → Use consistent hashing

Scenario B: Too Many Requests (Data Replication)

You have 10 customers, but 100,000 requests per day:

All 10 salespeople have phone books with the same 10 customers
Any salesperson can handle any request

Routing: Distribute requests randomly to available salespeople → Use round-robin

Real Example: Scaler's Code Judge Service 💻

System: Online code evaluation platform

Problem: Students submit code for evaluation against test cases. A single server cannot handle the request volume.

Architecture:

Multiple app servers (round-robin load balancing)
Each server caches test case data locally
All servers have identical cached data (same test cases)

Flow:

Student submits code
Load balancer routes request to random app server (round-robin)
App server evaluates code using locally cached test cases
Returns results

Why this works: Test case data is static and identical across all servers. High request volume requires multiple servers, but data size is manageable for local caching.

Load Balancer's Role in Distributed Caches 🔀

For Global Distributed Cache

App servers see the cache as a single logical unit
Load balancer abstracts away the fact that there are multiple Redis servers
App servers don't care which physical cache server handles the request

For Local Distributed Cache

Users don't care which app server handles their request
Load balancer routes requests to appropriate app servers
Each app server then uses its own local cache

Decision Framework 🎯

Ask yourself: WHY did we distribute the cache?

If Data Was Too Large:

WHY: Data doesn't fit on one server
HOW: Shard data across servers
RESULT: Different servers have different data
ROUTING: Consistent hashing

If Requests Were Too High:

WHY: One server can't handle request volume
HOW: Replicate data across servers
RESULT: Different servers have same data
ROUTING: Round-robin

Example: Twitter's Elon Musk Profile

Problem: Elon Musk's Twitter profile is viewed 100 million times per day.

Naive solution: Compute the profile once, cache it in a single Redis server.

Bottleneck: A single cache server cannot handle 100 million requests per day.

Optimized solution:

Cache Elon Musk's profile in multiple Redis servers with identical data
Use round-robin to distribute requests across all cache servers
Each server serves the same cached profile, but load is distributed

Local Cache with Round-Robin 🔄

What happens if you use round-robin with local caches?

Requests are randomly distributed across app servers. Each server caches data it encounters. Eventually, all servers end up with similar cached data.

Use case: High request volume where a single server cannot handle the load, but data size is manageable. Multiple servers process the same type of requests using identical cached data.

Next: Cache eviction policies — LRU, FIFO, and why LRU wins 99% of the time.