Consistent Hashing in Practice: Caching, Fan-Out Queries, and Implementation Details

Q: What is "Consistent Hashing in Practice: Caching, Fan-Out Queries, and Implementation Details" about?

When consistent hashing applies beyond the database tier - cache clusters, notification brokers, stream partitions - why fan-out queries demand a better sharding key, and the implementation details senior interviews probe: k hash functions, collision analysis, the k=64 sweet spot, and latency estimation at Google scale.

Q: What topics does "Consistent Hashing in Practice: Caching, Fan-Out Queries, and Implementation Details" cover?

This article covers: consistent hashing caching, fan out queries, scatter gather, k virtual nodes, collision analysis, capacity weighted hashing.

Here is what took me a while to fully appreciate: consistent hashing is not just a database routing trick. The moment any server tier holds state, the same problem appears - and the same solution applies.

The Stateful Rule

The clearest way to think about it: whenever a server tier holds state, use consistent hashing. When it is stateless, round-robin is enough.

Application servers are stateless - route anywhere. Database servers are stateful - consistent hashing routes correctly. Cache servers are stateful. Notification brokers that maintain per-user queues are stateful. Real-time data pipelines that partition streams by user or topic are stateful. Every one of those tiers needs consistent hashing for the same reason the database does.

Caching Systems

Distributed caches (Redis Cluster, Memcached) store user-specific or object-specific data. A cache hit requires routing the request to the same node that stored the entry. If the routing is wrong, you get a cache miss - the data is there, but on the wrong node. Consistent hashing is the standard strategy for cache clusters.

Fan-Out Queries and the Sharding Key Problem

Here is a sharp limitation that consistent hashing cannot solve on its own.

If your sharding key is user_id and a single request needs data across multiple users - "show me all activity from everyone in this workspace" - the load balancer has to query every shard that holds any of those users' data. This is called a fan-out or scatter-gather query, and it is expensive: higher latency, more I/O, more servers under load simultaneously.

The fix is not a better routing algorithm. It is a better sharding key.

If cross-user queries are the dominant pattern for a given grouping (workspace, team, region), shard by that grouping instead of by user. All members of a workspace land on the same shard. Queries within that workspace hit one server. The routing algorithm can only distribute load evenly - your access patterns determine what to distribute on.

Generating k Hash Functions from One

You do not write k separate hash functions. You parameterize a single one:

Plain text

h1 = SHA256(server_id + "1")
h2 = SHA256(server_id + "2")
h3 = SHA256(server_id + "3")
...
hk = SHA256(server_id + "k")

Each call with a different suffix produces a different, independent-looking output. Same underlying algorithm, different seeds - effectively k distinct hash functions. Real consistent hashing libraries all work this way. The same idea is used in password salting: one hash function, different inputs.

Collision Analysis

With a 64-bit output space, there are 2⁶⁴ ≈ 18 quintillion possible hash values. The probability that any two specific virtual nodes land on the same ring position is roughly 1 in 18 quintillion.

If a collision does happen, the effect is minor: one server loses one virtual spot out of k. With 63 remaining spots instead of 64, load distribution barely shifts. The probability that all virtual nodes collide is 1 / (2⁶⁴)^(n × k) - for 10 servers at k=64, a denominator with thousands of zeros. Collisions do not affect the algorithm in practice.

Why k=64 Is the Sweet Spot

Hash outputs look random even though they are deterministic. With k=1, you might get very unequal arc sizes purely by chance - like flipping a coin once and getting heads does not tell you much. With k=64, the law of large numbers applies: 64 independent positions spread much more evenly across the ring.

Below k=16, distribution is noticeably unequal. Above k=64, improvement is marginal while the ring grows larger and binary search takes slightly longer. Nobody uses k above 64 in production.

Capacity-Weighted Hashing

The vanilla algorithm assigns the same k to every server regardless of hardware. If you have heterogeneous servers - some with 64 GB RAM, some with 256 GB - you can give the more powerful machines more virtual spots.

When the Load Balancer pings a server for a health check, it can also fetch the server's hardware specs. A server with 4x the resources gets 4x the virtual spots, absorbing 4x the traffic. This is a direct extension of consistent hashing, not a different algorithm.

Latency Estimation

For Google's scale: 10 million servers, k=64 → 640 million ring entries. Binary search takes:

Plain text

log₂(640,000,000) = log₂(64) + log₂(10,000,000)
                  ≈ 6 + 23
                  ≈ 30 comparisons

RAM random access: approximately 10 nanoseconds. Operations per binary search iteration: roughly 10. Total:

Plain text

30 iterations × 10 operations × 10 ns = 3,000 ns = 3 microseconds

One routing decision at Google's full scale takes about 3 microseconds. The overhead is invisible compared to the cost of the actual request.

[!TIP] The key facts to memorise: 2^10 ≈ 1K, 2^20 ≈ 1M, 2^30 ≈ 1B. From these, you can estimate any log₂ value in under 10 seconds. Interviewers at senior level ask these calculations cold. Practice them until they are reflex.

The Essentials

Any stateful tier needs consistent hashing - databases, cache clusters, notification brokers, stream partitions: the rule is the same everywhere. Stateless tiers use round-robin; stateful tiers need deterministic routing.
Fan-out queries signal a wrong sharding key - consistent hashing distributes evenly whatever unit you shard on, so if cross-unit queries are common, the sharding key should reflect the real access boundary (workspace, region, team).
k=64 is the production sweet spot - below 16, distribution is visibly unequal; above 64, marginal gains disappear while ring size and binary search cost grow. Capacity-weighted hashing extends this by giving more powerful servers proportionally more virtual spots.

Consistent Hashing in Practice: Caching, Fan-Out Queries, and Implementation Details

The Stateful Rule

Caching Systems

Fan-Out Queries and the Sharding Key Problem

Generating k Hash Functions from One

Collision Analysis

Why k=64 Is the Sweet Spot

Capacity-Weighted Hashing

Latency Estimation

The Essentials

Further Reading and Watching