Backend Caching: Global vs. Local Cache Architectures

March 5, 20263 min read
system designhigh level designHLDdistributed systemsscalabilitymicroservicesload balancingcachingdatabase designAPI designsoftware architecture

Where Backend Caches Fit ⚙️

image

Architecture layers:

Client → DNS → CDN (for static content) Load Balancer Application Servers (stateless) [CACHE LAYER] ← Backend cache sits here Database Servers

Why backend caching matters:

Application servers are stateless (they don't store data). Every data request goes to the database, which is slow. Adding a cache layer between application servers and databases dramatically improves performance.

Type 1: Global Cache 🌐

Architecture:

  • Cache exists as a separate microservice
  • Sits between application servers and database
  • Shared across all application servers (hence "global")

Example: Redis cluster

All app servers can read/write to the same cache instance. This provides consistency — if Server A caches a value, Server B can retrieve it immediately.

Code Example: Global Cache (Redis)

def get_user_preferences(request): user_id = request.user_id # Check global cache (Redis) preferences = RedisClient.get_key(f"preferences:{user_id}") if preferences is None: # Cache miss - fetch from database preferences = SQLClient.get_preferences(user_id) # Update global cache RedisClient.set_key(f"preferences:{user_id}", preferences) return preferences

How it works: All app servers access the same Redis cluster. If Server A caches data, Server B can retrieve it immediately.

Type 2: Local Cache 💾

Architecture:

  • Cache exists inside each application server
  • Each server has its own private cache (RAM or disk)
  • Not shared between servers

Storage location:

  • In-memory (RAM) — fastest
  • On-disk (hard drive) — slower but persistent

Key characteristic: Each app server has a private cache. Server A cannot access Server B's cache. This can make servers somewhat stateful (storing temporary data).

Code Example: Local Cache (In-Memory Dictionary)

# Local cache stored in app server RAM user_preferences_cache = {} def get_user_preferences(request): user_id = request.user_id # Check local cache (dictionary in RAM) if user_id in user_preferences_cache: return user_preferences_cache[user_id] # Cache miss - fetch from database preferences = SQLClient.get_preferences(user_id) # Update local cache user_preferences_cache[user_id] = preferences return preferences

How it works: Each app server maintains its own dictionary in RAM. Data cached by Server A is not accessible to Server B.

What Data Gets Cached? 📊

Two criteria for caching:

  1. Frequently accessed data — Data requested often (e.g., user profiles, product catalogs)
  2. Expensive-to-compute data — Data requiring complex joins or heavy computation

Example: LinkedIn Profile Page

A single profile page aggregates data from multiple database tables:

  • user_education — University, degrees
  • user_location — Geographic data
  • user_connections — Network size
  • user_profile_views — View count
  • user_posts — Recent activity
  • user_skills — Endorsements

Loading this page requires joining 50+ tables, which is computationally expensive.

Solution: Cache the entire aggregated profile data to avoid repeated expensive queries.

Performance Comparison: Local vs. Global Cache ⚡

Which is faster?

Local cache wins because:

  • RAM access: ~10 nanoseconds
  • Hard disk access: ~100 microseconds
  • Network access (to Redis): ~1 millisecond

Accessing local RAM/disk is orders of magnitude faster than fetching from a network-based cache like Redis.

Single vs. Distributed Caches 🏗️

Single Global Cache

Use case: Small datasets or low request volume

Architecture:

App Servers → Single Redis Server → Database

One Redis instance handles all cache requests. Sufficient when:

  • Cached data fits in one server's memory
  • Request volume doesn't overwhelm a single server

Distributed Global Cache

Use case: Large datasets or high request volume

Architecture:

App Servers → Load Balancer → Redis Cluster (multiple servers) → Database

Multiple Redis servers behind a load balancer. Required when:

  • Cached data exceeds single server capacity
  • Request volume requires horizontal scaling

Local Cache Distribution

Question: Can local caches be single or distributed?

Answer: Local caches are always distributed by default.

Why?

  • Production systems always run multiple app servers (for redundancy)
  • If you only have one app server and it crashes, the entire service goes down
  • Each app server has its own local cache
  • Therefore, local caches are inherently distributed (though each cache is private)

Next: Load balancing algorithms for distributed caches — consistent hashing vs. round-robin.