Consistent Hashing in Practice: Caching, Fan-Out Queries, and Implementation Details
When consistent hashing applies beyond the database tier, why fan-out queries demand better sharding keys, and the implementation details that come up in senior interviews: generating k functions, collision analysis, the k=64 sweet spot, and latency estimation.
Here is what took me a while to fully appreciate: consistent hashing is not just a database routing trick. The moment any server tier holds state, the same problem appears - and the same solution applies.
The Stateful Rule
The clearest way to think about it: whenever a server tier holds state, use consistent hashing. When it is stateless, round-robin is enough.
Application servers are stateless - route anywhere. Database servers are stateful - consistent hashing routes correctly. Cache servers are stateful. Notification brokers that maintain per-user queues are stateful. Real-time data pipelines that partition streams by user or topic are stateful. Every one of those tiers needs consistent hashing for the same reason the database does.
Caching Systems
Distributed caches (Redis Cluster, Memcached) store user-specific or object-specific data. A cache hit requires routing the request to the same node that stored the entry. If the routing is wrong, you get a cache miss - the data is there, but on the wrong node. Consistent hashing is the standard strategy for cache clusters.
Fan-Out Queries and the Sharding Key Problem
Here is a sharp limitation that consistent hashing cannot solve on its own.
If your sharding key is user_id and a single request needs data across multiple users - "show me all activity from everyone in this workspace" - the load balancer has to query every shard that holds any of those users' data. This is called a fan-out or scatter-gather query, and it is expensive: higher latency, more I/O, more servers under load simultaneously.
The fix is not a better routing algorithm. It is a better sharding key.
If cross-user queries are the dominant pattern for a given grouping (workspace, team, region), shard by that grouping instead of by user. All members of a workspace land on the same shard. Queries within that workspace hit one server. The routing algorithm can only distribute load evenly - your access patterns determine what to distribute on.
Generating k Hash Functions from One
You do not write k separate hash functions. You parameterize a single one:
h1 = SHA256(server_id + "1")
h2 = SHA256(server_id + "2")
h3 = SHA256(server_id + "3")
...
hk = SHA256(server_id + "k")Each call with a different suffix produces a different, independent-looking output. Same underlying algorithm, different seeds - effectively k distinct hash functions. Real consistent hashing libraries all work this way. The same idea is used in password salting: one hash function, different inputs.
Collision Analysis
With a 64-bit output space, there are 2⁶⁴ ≈ 18 quintillion possible hash values. The probability that any two specific virtual nodes land on the same ring position is roughly 1 in 18 quintillion.
If a collision does happen, the effect is minor: one server loses one virtual spot out of k. With 63 remaining spots instead of 64, load distribution barely shifts. The probability that all virtual nodes collide is 1 / (2⁶⁴)^(n × k) - for 10 servers at k=64, a denominator with thousands of zeros. Collisions do not affect the algorithm in practice.
Why k=64 Is the Sweet Spot
Hash outputs look random even though they are deterministic. With k=1, you might get very unequal arc sizes purely by chance - like flipping a coin once and getting heads does not tell you much. With k=64, the law of large numbers applies: 64 independent positions spread much more evenly across the ring.
Below k=16, distribution is noticeably unequal. Above k=64, improvement is marginal while the ring grows larger and binary search takes slightly longer. Nobody uses k above 64 in production.
Capacity-Weighted Hashing
The vanilla algorithm assigns the same k to every server regardless of hardware. If you have heterogeneous servers - some with 64 GB RAM, some with 256 GB - you can give the more powerful machines more virtual spots.
When the Load Balancer pings a server for a health check, it can also fetch the server's hardware specs. A server with 4x the resources gets 4x the virtual spots, absorbing 4x the traffic. This is a direct extension of consistent hashing, not a different algorithm.
Latency Estimation
For Google's scale: 10 million servers, k=64 → 640 million ring entries. Binary search takes:
log₂(640,000,000) = log₂(64) + log₂(10,000,000)
≈ 6 + 23
≈ 30 comparisonsRAM random access: approximately 10 nanoseconds. Operations per binary search iteration: roughly 10. Total:
30 iterations × 10 operations × 10 ns = 3,000 ns = 3 microsecondsOne routing decision at Google's full scale takes about 3 microseconds. The overhead is invisible compared to the cost of the actual request.
[!TIP] The key facts to memorise:
2^10 ≈ 1K,2^20 ≈ 1M,2^30 ≈ 1B. From these, you can estimate any log₂ value in under 10 seconds. Interviewers at senior level ask these calculations cold. Practice them until they are reflex.
The Essentials
- Any stateful tier needs consistent hashing - databases, cache clusters, notification brokers, stream partitions: the rule is the same everywhere. Stateless tiers use round-robin; stateful tiers need deterministic routing.
- Fan-out queries signal a wrong sharding key - consistent hashing distributes evenly whatever unit you shard on, so if cross-unit queries are common, the sharding key should reflect the real access boundary (workspace, region, team).
- k=64 is the production sweet spot - below 16, distribution is visibly unequal; above 64, marginal gains disappear while ring size and binary search cost grow. Capacity-weighted hashing extends this by giving more powerful servers proportionally more virtual spots.
Further Reading and Watching
- Consistent Hashing - MIT 6.824 Distributed Systems (YouTube) - academic-level treatment of the algorithm including the original Karger et al. paper context.
- Distributed Caching with Consistent Hashing - Martin Kleppmann's Data-Intensive Applications Blog - how consistent hashing applies to cache clusters and what breaks when routing diverges.
Practice what you just read.
Keep reading