Routing Algorithms
The five properties of a good routing algorithm, and why round-robin, bucketing, and the mapping table each fall short of all five.
What Makes a Good Routing Algorithm
Every incoming request must be routed. The routing algorithm executes on every single request, so its properties matter a lot. A good routing algorithm has five properties:
1. Fast. The algorithm must compute in near-zero time. If routing adds meaningful latency to every request, it becomes a bottleneck that touches the entire system.
2. Equal load distribution. Requests and data should be spread roughly evenly across all servers. It does not need to be exact -- 33%, 34%, 33% is fine -- but a lopsided distribution like 90%, 5%, 5% defeats the purpose of having multiple servers.
3. Handle server changes. Servers crash. New servers are added. The routing algorithm must adapt to these changes without manual reconfiguration.
4. Minimize data movement. When a server is added or removed, some data will need to move to different servers. A good algorithm moves only the data that must move. Moving unnecessary data wastes bandwidth, slows the system, and risks data loss during the transfer.
5. Deterministic with no information exchange. Given the same input and the same server list, the algorithm must always produce the same output. It should not be probabilistic. And critically, it should be able to compute the answer locally -- without contacting other nodes to ask where something lives.
Round-Robin Routing
The simplest routing algorithm is modular hashing, often called round-robin.
Extract the user ID from the request. Divide it by N (the number of servers). The remainder is the server index.
const serverList = ['A', 'B', 'C', 'D']; // updated by health checks
const N = serverList.length;
function route(request) {
const userId = extractUserId(request);
const serverIndex = userId % N;
const server = serverList[serverIndex];
forwardRequest(server, request);
}With four servers and sequential user IDs, the assignment looks like this:
| User ID | userID % 4 | Server |
|---|---|---|
| 0 | 0 | A |
| 1 | 1 | B |
| 2 | 2 | C |
| 3 | 3 | D |
| 4 | 0 | A |
| 5 | 1 | B |
The pattern repeats. Every user gets a stable, deterministic assignment. User 4 always goes to A, user 5 always goes to B, and so on.
ExpandRound-Robin Routing with Mod N
The Problem with Round-Robin
Round-robin is fast, deterministic, and distributes load evenly -- when the server list is stable. But when a server crashes or is added, N changes. And when N changes, almost every user remaps to a different server.
Say Server B crashes. The server list shrinks from [A, B, C, D] to [A, C, D]. N becomes 3.
| User ID | Old (N=4) | New (N=3) | Changed? |
|---|---|---|---|
| 0 | A | A | No |
| 1 | B | C | Yes |
| 2 | C | A | Yes |
| 3 | D | C | Yes |
| 4 | A | A | No |
| 5 | B | C | Yes |
Only users whose userID % 4 and userID % 3 happen to land on the same server index stay put. Almost everyone else remaps. This means almost all the data needs to move to different servers -- exactly the property a good routing algorithm should avoid.
This is the fundamental weakness of simple modular hashing.
When Round-Robin Is Acceptable
Round-robin has real advantages: it is extremely fast, it distributes load equally, and multiple load balancers can run it independently without coordinating with each other -- each one just applies the same formula locally.
The problem is purely the changing N. If you can guarantee that N is fixed, round-robin is perfectly fine.
Elasticsearch does exactly this. It forces the number of shards to be declared upfront and does not allow it to change after the fact. Because N never changes, the modular hash always produces the same assignment, and the massive-data-movement problem never occurs.
For any system where the number of servers can change -- which is most systems -- round-robin should not be used.
Bucketing (Range-Based Routing)
A variation on round-robin is to assign ranges of user IDs to servers instead of using modular arithmetic.
Given 400 users and 4 servers, each bucket holds 100 users:
- Users 0-99 → Server A
- Users 100-199 → Server B
- Users 200-299 → Server C
- Users 300-399 → Server D
const bucketSize = Math.floor(totalUsers / numServers); // 100
const serverIndex = Math.floor(userId / bucketSize);The logic is simple and easy to reason about. But the same fundamental problem applies. When a server crashes and N drops from 4 to 3, the bucket size recalculates to 133. Almost every user ends up in a different bucket. The data reshuffles across the board.
And bucketing has an additional problem round-robin does not: new user sign-ups break the bucket boundaries. If a 401st user joins, the calculation of totalUsers / numServers changes, and all existing bucket assignments shift. Every time a new user registers, every existing user's routing assignment potentially changes. This is completely impractical.
Mapping Table (Hash Map)
A different approach: explicitly track which user belongs to which server in a hash map. No formula. Just a direct lookup table.
{ user_0 → ServerA, user_1 → ServerB, user_2 → ServerC, ... }When a request arrives, look up the user in the map and route to the recorded server. O(1) lookup.
When a server crashes: iterate through the table, find all entries pointing to the crashed server, and reassign them to other servers. Only the users who were on the crashed server get moved. Every other user stays exactly where they are. No unnecessary data movement.
When a new server is added: calculate how many users should migrate to rebalance the load (totalUsers / newN). Randomly select that many users from the existing entries and reassign them to the new server. Load stays roughly equal.
// Handle crashed server
for (const [userId, server] of mappingTable) {
if (server === crashedServer) {
const newServer = pickRandomHealthyServer();
mappingTable.set(userId, newServer);
migrateData(userId, crashedServer, newServer);
}
}
// Handle new server added
const usersToMove = Math.floor(totalUsers / newNumServers);
const randomUsers = pickRandom(mappingTable.keys(), usersToMove);
for (const userId of randomUsers) {
mappingTable.set(userId, newServer);
migrateData(userId, oldServer, newServer);
}This solves the unnecessary data movement problem. Only the affected users get moved. But there is a significant tradeoff.
The tradeoff: the mapping table itself can be enormous. A system with 100 million users has a mapping table with 100 million entries. That table must be stored somewhere in memory. And if multiple load balancers are running in parallel, they all need to read from and write to the same table -- which means they need to coordinate.
ExpandRouting Algorithms Compared
Why the Mapping Table Breaks at Scale
The memory concern is real but manageable. Each entry in the table stores a userID (8 bytes) and a serverID (4 bytes) -- 12 bytes per user. With 5 billion users, that is 60 GB of memory. Large, but not impossible with modern hardware.
The fatal problem is synchronization. There are multiple load balancers running in parallel. Every load balancer must have the exact same mapping table at all times. If two load balancers diverge -- one updates a user's server assignment and the other does not yet know -- requests from the same user go to two different servers. That user's data appears inconsistent or missing.
Keeping all load balancers synchronized with low latency is not just technically difficult. It is mathematically impossible. The CAP theorem (covered in a later lecture) formally proves that you cannot have perfect synchronization, high availability, and low latency simultaneously. So the mapping table, while it solves the data movement problem, creates an unsolvable coordination problem.
Summary of what each algorithm achieves:
| Property | Round-Robin | Bucketing | Mapping Table |
|---|---|---|---|
| Fast | Yes | Yes | Yes |
| Equal distribution | Yes | Yes | Yes |
| Handles server changes | No (reshuffles all) | No (reshuffles all) | Yes |
| Minimal data movement | No | No | Yes |
| No LB coordination needed | Yes | Yes | No (must sync) |
No algorithm has all five. This is why consistent hashing exists.
Practice what you just read.
Keep reading
Enjoyed this? Get more like it.
Deep dives on system design, React, web development, and personal finance — straight to your inbox. Free, always.