Why Your Single Server Won't Scale (And What To Do About It)

March 5, 20264 min read
system designhigh level designHLDdistributed systemsscalabilitymicroservicesload balancingcachingdatabase designAPI designsoftware architecture

The Problem: Joshua's Delicious App

Joshua built Delicious, a bookmark management app. Initially, it ran on his laptop:

  • 40 GB disk space
  • Handled early users fine
  • Simple, single-server architecture

Then success hit. Users flooded in. His laptop couldn't handle the load.

Two Scaling Approaches

Vertical Scaling: Bigger Server 💪

Concept: Replace your server with a more powerful one.

Joshua's options:

  • Laptop (40 GB) → Desktop (1 TB disk, better CPU)
  • Desktop → Workstation (2 TB, powerful CPU)
  • Workstation → Server-grade machine (10 TB, enterprise CPU)

Advantages:

  • Simple to implement
  • No code changes needed
  • Immediate performance boost

Fatal flaws:

  1. Hardware ceiling: You can't buy infinite RAM/CPU
  2. Cost explosion: Exponentially expensive at high end
  3. Single point of failure: One machine crash = total outage

Horizontal Scaling: More Servers 🚀

Concept: Add more servers to distribute the load.

Joshua's solution:

  • Buy 500 cheap laptops (40 GB each)
  • Total capacity: 20 TB (vs. single 10 TB server)
  • Cost: $500 × 500 = $250,000
  • Alternative vertical: $1,000,000+ for equivalent capacity

Advantages:

  • Cost-effective scaling
  • No hardware limits
  • Redundancy (one failure doesn't kill system)

The catch: Massive complexity increase.

The Complexity Challenge

With horizontal scaling, simple questions become hard:

Question 1: Which server IP should DNS register?

  • You have 500 servers
  • DNS needs to return an IP address
  • Which one?

Question 2: How do users reach the right server?

  • User data lives on specific servers
  • Wrong server = "your data is gone"
  • Need consistent routing

Question 3: What if a server crashes?

  • 500 servers = higher failure probability
  • Need automatic failover
  • Can't manually redirect traffic

Enter: The Load Balancer 🎯

Solution: Designate one server as a "load balancer."

Users → Load Balancer → 500 Application Servers

Load balancer responsibilities:

1. Unified View Users see "one system," not 500 servers. Like connecting to google.com—you don't care that Google has millions of servers.

2. Request Distribution Route each request to the appropriate server based on routing algorithm.

Load Balancer Implementation

How it works:

Install specialized software on one server:

  • Nginx
  • HAProxy
  • AWS Elastic Load Balancer
  • Cloud Load Balancer

This server becomes the "front door":

  • DNS registers load balancer's IP
  • All traffic goes to load balancer first
  • Load balancer forwards to application servers

Server Health Tracking

Problem: Load balancer must know which servers are alive.

Two mechanisms:

Health Check (Pull-Based) ✅ Preferred

# Load balancer pings servers periodically every 5 seconds: for server in servers: response = ping(server, "/health") if response.timeout() after 3 consecutive times: mark_as_dead(server)

Why preferred: Load balancer controls monitoring, no app server responsibility.

Heartbeat (Push-Based)

# Servers ping load balancer every 5 seconds: send_ping_to_load_balancer("I'm alive") # Load balancer tracks last ping time if no_ping_received_for(15_seconds): mark_as_dead(server)

Less common: Adds responsibility to application servers.

Load Balancer Performance 💨

Application Server Complexity

Application servers do heavy lifting:

  • Encryption/decryption
  • Request deserialization
  • Authorization checks
  • Database queries
  • Business logic
  • Response serialization

Capacity: 100-1,000 requests/second

Load Balancer Simplicity

Load balancers are lightweight:

  • Inspect IP addresses
  • Forward requests
  • Minimal processing

Capacity: ~100,000 requests/second

100x more powerful than app servers!

The Load Balancer Bottleneck

Problem: Even powerful load balancers have limits.

Google's scale: 10 million requests/second Single load balancer: 100,000 requests/second max

One load balancer won't suffice.

Solution: Multiple Load Balancers

Users → DNS → Multiple Load Balancers → Application Servers

DNS returns multiple IP addresses:

  • Load Balancer 1: 1.2.3.4
  • Load Balancer 2: 1.2.3.5
  • Load Balancer 3: 1.2.3.6

Client picks:

  • Nearest load balancer (geographic)
  • Random load balancer
  • First available load balancer

Now DNS acts as a load balancer for load balancers!


Key Takeaways

  1. Vertical scaling is limited by hardware ceilings
  2. Horizontal scaling is cost-effective but complex
  3. Load balancers solve the complexity by providing unified interface
  4. Health checks keep track of server availability
  5. Multiple load balancers prevent bottlenecks at scale