Cache Invalidation and TTL: Managing Data Freshness and Staleness

March 5, 20264 min read
system designhigh level designHLDdistributed systemsscalabilitymicroservicesload balancingcachingdatabase designAPI designsoftware architecture

The Invalidation Problem 🔴

Fundamental issue: Cache is not the source of truth. The database is.

Scenario: Stale Data

Initial state:

  • Database: A = 10
  • Cache: A = 10

Database update:

  • Database: A = 20 (updated)
  • Cache: A = 10 (stale)

Read request arrives:

  1. Check cache
  2. Cache returns A = 10 (stale value)
  3. Return to user without checking database

Result: User receives outdated data (10 instead of 20).

This is why invalidation algorithms are essential. Without invalidation, the cache serves stale information indefinitely.

TTL (Time-to-Live): Basic Invalidation Policy ⏰

How TTL Works

Core concept: Every cached entry has an expiration timestamp. After expiration, the entry is marked as invalid.

Example:

  • Write A = 10 to cache at 8:00 AM
  • TTL configuration: 1 hour
  • Expiry timestamp: 8:00 AM + 1 hour = 9:00 AM

Cache storage:

Key: A Value: 10 Expiry: 9:00 AM

TTL Read Behavior

Scenario 1: Read Before Expiry (8:30 AM)

1. Read request for A arrives 2. Current time: 8:30 AM 3. Expiry time: 9:00 AM 4. 8:30 AM < 9:00 AM → Valid 5. Cache returns A = 10

Scenario 2: Read After Expiry (9:15 AM)

1. Read request for A arrives 2. Current time: 9:15 AM 3. Expiry time: 9:00 AM 4. 9:15 AM > 9:00 AM → Expired 5. Cache deletes A 6. Cache returns 404 (key not found) 7. App server fetches from database

Common Misconception: TTL is NOT Eviction ⚠️

Many sources (including Redis documentation) classify TTL as an eviction policy. This is incorrect.

Why TTL is Invalidation, Not Eviction

Eviction:

  • Trigger: Write request (need to add new data, cache is full)
  • Purpose: Free up space

Invalidation (TTL):

  • Trigger: Read request (check if data is still valid)
  • Purpose: Ensure data freshness

TTL doesn't free space when the cache is full. It marks data as invalid based on time, regardless of space constraints.

Eager vs. Lazy Deletion 🐢

Eager Deletion (BAD Approach)

Idea: Automatically delete expired entries exactly when they expire.

Implementation:

  1. Maintain a priority queue sorted by expiry time
  2. Run background timer (checks every 1-2 seconds)
  3. Scan priority queue, delete expired entries

Problems:

Performance jitter: If 10,000 entries expire at 9:00 AM, the background process gets overwhelmed. Cache requests arriving during this deletion spike experience delays.

Unpredictable performance: Sometimes fast (no deletions happening), sometimes slow (mass deletions in progress).

Extra overhead: Background timer consumes CPU and memory.

Lazy Deletion (CORRECT Approach)

Idea: Only delete expired entries when a read request arrives.

Implementation:

  1. Read request for A arrives at 10:00 AM
  2. Cache checks: Expiry = 9:00 AM, Current = 10:00 AM
  3. Entry expired → Delete and return 404
  4. App server fetches from database

Key point: Even though the entry expired at 9:00 AM, it remains in the cache until 10:00 AM (when the read request triggers deletion).

Doesn't Lazy Deletion Waste Space?

Question: If expired entries stay in the cache, doesn't that waste memory?

Answer: No, because eviction handles space management.

Remember: Eviction and invalidation run simultaneously:

  • Invalidation (TTL): Marks data as stale based on time
  • Eviction (LRU): Removes least-used data to free space

If A expires at 9:00 AM but no read request comes until 10:00 AM, LRU eviction will likely remove it before 10:00 AM (since it hasn't been accessed recently).

Division of responsibility:

  • Invalidation: Ensure data freshness
  • Eviction: Manage space

TTL and Eventual Consistency ⏳

Does TTL Provide Strong Consistency?

No. TTL provides eventual consistency.

Scenario:

  • Cache entry: A = 10, Expiry: 9:00 AM
  • Database updated at 8:30 AM: A = 20
  • Read request arrives at 8:45 AM

What happens:

  1. Check cache
  2. Expiry = 9:00 AM, Current = 8:45 AM → Valid
  3. Cache returns A = 10 (stale data)
  4. User receives outdated value

Why stale? The database updated to 20, but the cache still holds 10 (and won't expire until 9:00 AM).

Consistency guarantee: Eventually (after 9:00 AM), reads will fetch fresh data from the database. But before expiry, stale reads are possible.

Asynchronous Cache Updates 🔄

Question: When fetching data from the database on a cache miss, should we update the cache synchronously or asynchronously?

Answer: Asynchronous updates are preferred to reduce user latency.

Flow:

  1. App server fetches data from database (cache miss)
  2. App server returns data to user immediately
  3. App server updates cache in the background (parallel process)

Why asynchronous? The user doesn't wait for the cache update to complete. Response time is faster.


Next: Consistency models — strong vs. eventual consistency in distributed systems.