Consistent Hashing Hash Functions and The Ring
The Bold Claim
Consistent hashing provides:
- ✅ Fast computation
- ✅ Equal load distribution
- ✅ Dynamic server management
- ✅ Minimal data movement
- ✅ Deterministic without synchronization
All five requirements. Zero drawbacks.
Let's see how.
Foundation: Hash Functions
What is a Function? (Mathematical Definition)
Function: Deterministic mapping from inputs to outputs.
def add(a, b):
return a + b
# add(2, 3) = 5 today
# add(2, 3) = 5 tomorrow
# add(2, 3) = 5 foreverSame input always produces same output.
Hash Functions Defined
Hash function: Function with two properties:
- Accepts any input (unbounded)
- Returns value in fixed range (bounded)
Compresses information into fixed-size output.
Examples
def hash_1(data):
ascii_sum = sum(ord(char) for char in str(data))
return ascii_sum % 100
# Output range: 0-99
def hash_2(data):
ascii_sum = sum(ord(char) for char in str(data))
return ascii_sum % 799
# Output range: 0-798Industry hash functions:
- MD5
- SHA-256, SHA-512
- MurmurHash3
Visualizing Hash Output: The Ring 💍
Modulo operation wraps around:
0, 1, 2, ..., 98, 99, 0, 1, 2, ...Ring visualization:
0
/ \
99 1
| |
98 2
| |
... 50 ...For hash_1: Ring with 100 positions (0-99) For hash_2: Ring with 799 positions (0-798)
Important: Logical ring, not physical drawing.
Consistent Hashing Setup
Requires k+1 hash functions:
- 1 hash function for users
- k hash functions for servers
All functions share same output space (same ring size).
Step 1: Hash Users onto Ring
hash_user("sanjana") = 23 # Sanjana at position 23
hash_user("santosh") = 59 # Santosh at position 59Each user gets one fixed position.
Key property: Deterministic
- Sanjana always hashes to 23
- Every request from Sanjana → position 23
- Never changes
Step 2: Hash Servers onto Ring (k times)
Choose k (typical values: 32 or 64)
Each server gets k positions:
# Server A with k=5
hash_server_1("A") = 12 # A at position 12
hash_server_2("A") = 45 # A at position 45
hash_server_3("A") = 67 # A at position 67
hash_server_4("A") = 78 # A at position 78
hash_server_5("A") = 91 # A at position 91Repeat for all servers (B, C, D).
Result: Each server has k virtual positions on ring.
Virtual Spots ≠ Replication
Clarification: We're not creating 5 copies of Server A.
Physical reality: One Server A exists.
Virtual representation: Server A has 5 routing positions.
Think of it as nicknames:
- Person: Sai
- Home: "Krishna"
- Girlfriend: "Bubu"
Same person, multiple labels. Virtual spots are routing labels, not physical servers.
The Routing Rule 🎯
When user sends request:
Go clockwise from user's position until you hit a server.
Examples:
Ring: [Sanjana(23), Server C(30), Abhilash(40), Server A(50), Santosh(60)]
Sanjana → Start at 23 → Clockwise → Server C at 30
Abhilash → Start at 40 → Clockwise → Server A at 50
Santosh → Start at 60 → Clockwise → Wraps to Server B at 5That's the complete algorithm:
- Hash user → Get position
- Go clockwise
- First server encountered = destination
Implementation: Ring as Sorted Array
Ring stored as sorted array:
ring = [
(3, B), (12, C), (17, B), (20, D), (23, A),
(30, D), (40, A), (40, D), (55, C), (81, A),
(87, C), (99, B)
]Routing via binary search:
def route_request(user_id):
user_position = hash_user(user_id)
# Binary search for upper bound
server_index = binary_search_upper_bound(ring, user_position)
return ring[server_index][1] # Return server IDComplexity: O(log(n × k)) where n = servers, k = virtual spots
Example: 10M servers, k=64
- Ring size: 640M
- Binary search: log₂(640M) ≈ 30 operations
- Extremely fast
Key Takeaways
- Hash functions compress unbounded input to bounded output
- Ring visualization helps understand modulo wraparound
- Users get 1 position each (deterministic)
- Servers get k positions each (virtual spots)
- Clockwise routing finds destination server
- Binary search makes it fast (O(log n×k))