Consistent Hashing

Hash functions, the consistent hashing ring, virtual nodes, binary search routing, a worked example, and performance at Google scale.

April 4, 20267 min read5 / 7

What Is a Hash Function?

A function in the mathematical sense is a deterministic mapping from input to output. Given the same input, it always returns the same output, no matter how many times it is called. This is different from what is often loosely called a "function" in code -- a procedure that reads from a global variable and returns different results on different calls is not a function in the strict sense.

A hash function is a function with one additional property: the output is always within a fixed, bounded range, regardless of the size or content of the input.

  • Inputs: unbounded. Anything can be fed in -- strings, numbers, objects.
  • Output: always within a fixed range, such as [0, 99] or [0, 2^32 - 1].

A simple example:

Plain text
hash_1(data): total = sum of ASCII values of each character in str(data) return total % 100

This function is deterministic -- the same input always gives the same output. And it is bounded -- the result is always between 0 and 99 because of the modulo.

Hash Function: Unbounded Input to Bounded Output ExpandHash Function: Unbounded Input to Bounded Output

Real hash functions used in production (like MD5, SHA-256, or MurmurHash) follow the same principle but distribute outputs much more uniformly -- a small change in input produces a completely different output. This uniform distribution property is what makes them useful for routing and sharding.

Hash Function Output Range

A hash function does not require you to explicitly specify the output range. The range is determined by the data type of the return value.

If a hash function returns a long, the output is automatically bounded between 0 and 2^64 - 1. A long is a 64-bit integer, so those are the only values it can hold. You get the bounded output for free from the type system.

The modulo operation (% N) is a separate step you apply on top of this -- to map the large bounded output down to a smaller range like 0 to 99 for use on a ring or a shard table. But the hash function itself is already bounded by its return type.

How to Calculate log₂ in Your Head

Binary search on the consistent hashing ring runs in O(log N) where N = numServers × k. When working with numbers at scale, you need to estimate log₂ quickly. The following three approximations handle most cases:

ValuePower of 2
1,000 (thousand)≈ 2¹⁰
1,000,000 (million)≈ 2²⁰
1,000,000,000 (billion)≈ 2³⁰

This comes from the fact that 2¹⁰ = 1,024 ≈ 1,000. Each additional factor of 1,000 adds 10 to the exponent.

Example: How many binary search iterations for an array of 64 billion entries?

Plain text
log₂(64 billion) = log₂(64 × 1,000,000,000) = log₂(64) + log₂(1,000,000,000) = 6 + 30 = 36

64 is 2⁶, billion is ≈ 2³⁰. Done in two steps. The actual value is 35.89 -- the estimate is off by 0.11.

Consistent Hashing

Consistent hashing is the routing algorithm that satisfies all five properties. Here is how it works, step by step.

The Ring

Take the output space of the hash function and visualize it as a ring. If the hash function outputs values between 0 and 99, the ring has 100 positions. Position 99 wraps back around to position 0. This is a logical construct -- not a data structure drawn on screen -- but it is the mental model that makes the algorithm easy to reason about.

Placing Users on the Ring

For every user, hash their user ID with one hash function. The output is a position on the ring. Place the user at that position.

Because the hash function is deterministic, a user's ID will always hash to the same position. Their spot on the ring is fixed and stable. Every time they send a request, they are at the same position.

Placing Servers on the Ring

For every server, hash the server ID k times, using k different hash functions. Each hash produces a different position, so each server occupies k spots on the ring. These are called virtual nodes.

With four servers (A, B, C, D) and k = 5, there are 20 virtual nodes total -- 5 per server. Server A has 5 different positions on the ring. Server B has 5 different positions. And so on.

No actual copies of the server are created. These are just positions -- entries in a sorted array that say "server A is also at position 37."

A typical value of k in production is 32 or 64. The higher k is, the more evenly distributed the load becomes.

Routing a Request

When a user sends a request:

  1. Find the user's position on the ring (hash their user ID).
  2. Walk clockwise from that position.
  3. The first server virtual node encountered is the server that handles the request.

That is the entire algorithm.

Sanjana is at position 5. Walking clockwise, the first server node is A at position 12. Sanjana goes to Server A. Ablesh is at position 20. The first server clockwise is C at position 29. Ablesh goes to Server C. Santosh is at position 34. The first server clockwise is B at position 47. Santosh goes to Server B.

Consistent Hashing Ring ExpandConsistent Hashing Ring

Implementation: A Sorted Array

In code, the ring is just a sorted array of (hashValue, serverID) pairs. To route a request:

JavaScript
function buildRing(servers, k, hashFns) { const ring = []; for (const server of servers) { for (let i = 0; i < k; i++) { const position = hashFns[i](server); ring.push({ position, server }); } } ring.sort((a, b) => a.position - b.position); return ring; } function route(userId, ring, hashFn) { const userPosition = hashFn(userId); // Find the first ring entry with position >= userPosition for (const node of ring) { if (node.position >= userPosition) return node.server; } // Wrap around: if no server found clockwise, take the first one return ring[0].server; }

The lookup is O(log N) with binary search on the sorted array (where N is servers × k).

Building the Ring: A Worked Example

With servers [A, B, C, D] and k = 3, here is what the ring construction produces:

ServerHash 1Hash 2Hash 3
A238140
B31799
C128755
D203040*

(*collision with A at 40 -- handled by tie-breaking in binary search)

All 12 entries go into an array and get sorted by hash value:

Plain text
[(3, B), (12, C), (17, B), (20, D), (23, A), (30, D), (40, A), (40, D), (55, C), (81, A), (87, C), (99, B)]

This sorted array is the ring.

When user Udit sends a request:

  1. Hash Udit's user ID: hash(udit, 0) → 33
  2. Binary search the ring for the first entry with position ≥ 33
  3. That entry is (40, A) -- the first server clockwise from position 33

Udit's request goes to Server A.

Why the algorithm must be deterministic: if Udit's user ID does not always hash to the same value, his requests would scatter across different servers. Half his data on Server A, half on Server C. When he tries to read, neither server has the complete picture and the data looks lost.

Because the hash function is deterministic, Udit always hashes to 33, and Server A is always at position 40. Udit always goes to Server A. His data is always in one place.

Performance: How Fast Is Consistent Hashing?

The ring lookup is a binary search on an array of size numServers × k.

At Google's scale -- 10 million servers with k = 64:

Plain text
ring size = 10,000,000 × 64 = 640,000,000 log₂(640,000,000) ≈ 30

30 comparisons in memory to route any request.

Where the 3 Microseconds Figure Comes From

A modern DDR5 RAM module has a random access time of approximately 10 nanoseconds.

In one iteration of binary search, the CPU performs a fixed number of memory operations: read the low index, read the high index, compute the midpoint, read the array element at mid, compare. Using 10 as a reasonable estimate for memory reads per iteration:

Plain text
Cost per iteration = 10 memory reads × 10 ns = 100 ns Iterations for 640 million entries = ~30 Total = 100 ns × 30 = 3,000 ns = 3 microseconds

This is the overhead per routed request. At 3 µs, consistent hashing routing is negligible relative to the actual latency of a database query (typically 1-100 ms).

Practice what you just read.

Quiz: Consistent Hashing
1 exercise

Enjoyed this? Get more like it.

Deep dives on system design, React, web development, and personal finance — straight to your inbox. Free, always.