Consistent Hashing Hash Functions and The Ring

March 5, 20263 min read
system designhigh level designHLDdistributed systemsscalabilitymicroservicesload balancingcachingdatabase designAPI designsoftware architecture

The Bold Claim

Consistent hashing provides:

  • ✅ Fast computation
  • ✅ Equal load distribution
  • ✅ Dynamic server management
  • ✅ Minimal data movement
  • ✅ Deterministic without synchronization

All five requirements. Zero drawbacks.

Let's see how.

Foundation: Hash Functions

What is a Function? (Mathematical Definition)

Function: Deterministic mapping from inputs to outputs.

def add(a, b): return a + b # add(2, 3) = 5 today # add(2, 3) = 5 tomorrow # add(2, 3) = 5 forever

Same input always produces same output.

Hash Functions Defined

Hash function: Function with two properties:

  1. Accepts any input (unbounded)
  2. Returns value in fixed range (bounded)

Compresses information into fixed-size output.

Examples

def hash_1(data): ascii_sum = sum(ord(char) for char in str(data)) return ascii_sum % 100 # Output range: 0-99 def hash_2(data): ascii_sum = sum(ord(char) for char in str(data)) return ascii_sum % 799 # Output range: 0-798

Industry hash functions:

  • MD5
  • SHA-256, SHA-512
  • MurmurHash3

Visualizing Hash Output: The Ring 💍

Modulo operation wraps around:

0, 1, 2, ..., 98, 99, 0, 1, 2, ...

Ring visualization:

0 / \ 99 1 | | 98 2 | | ... 50 ...

For hash_1: Ring with 100 positions (0-99) For hash_2: Ring with 799 positions (0-798)

Important: Logical ring, not physical drawing.

Consistent Hashing Setup

Requires k+1 hash functions:

  • 1 hash function for users
  • k hash functions for servers

All functions share same output space (same ring size).

Step 1: Hash Users onto Ring

hash_user("sanjana") = 23 # Sanjana at position 23 hash_user("santosh") = 59 # Santosh at position 59

Each user gets one fixed position.

Key property: Deterministic

  • Sanjana always hashes to 23
  • Every request from Sanjana → position 23
  • Never changes

Step 2: Hash Servers onto Ring (k times)

Choose k (typical values: 32 or 64)

Each server gets k positions:

# Server A with k=5 hash_server_1("A") = 12 # A at position 12 hash_server_2("A") = 45 # A at position 45 hash_server_3("A") = 67 # A at position 67 hash_server_4("A") = 78 # A at position 78 hash_server_5("A") = 91 # A at position 91

Repeat for all servers (B, C, D).

Result: Each server has k virtual positions on ring.

Virtual Spots ≠ Replication

Clarification: We're not creating 5 copies of Server A.

Physical reality: One Server A exists.

Virtual representation: Server A has 5 routing positions.

Think of it as nicknames:

  • Person: Sai
  • Home: "Krishna"
  • Girlfriend: "Bubu"

Same person, multiple labels. Virtual spots are routing labels, not physical servers.

The Routing Rule 🎯

When user sends request:

Go clockwise from user's position until you hit a server.

Examples:

Ring: [Sanjana(23), Server C(30), Abhilash(40), Server A(50), Santosh(60)] Sanjana → Start at 23 → Clockwise → Server C at 30 Abhilash → Start at 40 → Clockwise → Server A at 50 Santosh → Start at 60 → Clockwise → Wraps to Server B at 5

That's the complete algorithm:

  1. Hash user → Get position
  2. Go clockwise
  3. First server encountered = destination

Implementation: Ring as Sorted Array

Ring stored as sorted array:

ring = [ (3, B), (12, C), (17, B), (20, D), (23, A), (30, D), (40, A), (40, D), (55, C), (81, A), (87, C), (99, B) ]

Routing via binary search:

def route_request(user_id): user_position = hash_user(user_id) # Binary search for upper bound server_index = binary_search_upper_bound(ring, user_position) return ring[server_index][1] # Return server ID

Complexity: O(log(n × k)) where n = servers, k = virtual spots

Example: 10M servers, k=64

  • Ring size: 640M
  • Binary search: log₂(640M) ≈ 30 operations
  • Extremely fast

Key Takeaways

  1. Hash functions compress unbounded input to bounded output
  2. Ring visualization helps understand modulo wraparound
  3. Users get 1 position each (deterministic)
  4. Servers get k positions each (virtual spots)
  5. Clockwise routing finds destination server
  6. Binary search makes it fast (O(log n×k))