Cache Eviction Policies: LRU, FIFO, LFU, and the Principle of Locality
The Eviction Problem 🗑️
Scenario: Your cache is full, but you need to add new data.
The fridge joke:
- How do you put an elephant in the fridge? Open it, put the elephant in, close it.
- How do you put a giraffe in the fridge? Open it, take the elephant out, put the giraffe in, close it.
Eviction = Forceful removal of existing data to make space for new data (like a landlord evicting a tenant who doesn't pay rent).
Common Eviction Policies 📋
Standard eviction policies (from DSA):
LRU (Least Recently Used): Priority queue implementation
FIFO (First In, First Out): Queue implementation
LIFO (Last In, First Out): Stack implementation
LFU (Least Frequently Used): Frequency-based priority queue
MRU (Most Recently Used): Reverse of LRU
Data Locality Principles 📍
Temporal Locality (Time-Based)
Principle: If data is accessed now, it will likely be accessed again soon.
Example: You open a file today → High probability you'll open it again in 5-15 minutes.
LRU is based on temporal locality: Keep recently-accessed items, evict old items.
Spatial Locality (Physical Proximity)
Principle: If you access element D, you'll likely access nearby elements (C, E, F) next.
Example: Array [A, B, C, D, E, F, G, H]
- Just accessed D
- High probability next access: C, E, or F
- Low probability next access: A or H
Application: Memory paging uses spatial locality (load adjacent memory pages).
Limitation: Spatial locality isn't always definable (e.g., non-sequential data structures). Temporal locality is universally applicable.
Choosing an Eviction Policy: The Simple Answer 🎯
For 99% of use cases: Use LRU. Don't overthink it.
The Decision Framework
Step 1: Use LRU by default (no thinking required)
Step 2: Optimize everything else in your system first
- Database queries
- Network latency
- Code performance
- Infrastructure scaling
Step 3: Only after exhausting all other optimizations, consider alternative eviction policies
Step 4: Don't guess which policy is best. Benchmark all of them with production data.
Why Not Guess?
Bad approach: "My use case is X, so policy Y should work best" (assumptions lead to poor decisions)
Good approach:
- Implement LRU, FIFO, LFU
- Benchmark with real traffic
- Choose based on data, not assumptions
Analogy: Touching a hot pan → You react immediately (use LRU). You don't stop to think about it.
Eviction vs. Invalidation: Key Differences 🔄
Many developers confuse these two concepts:
Eviction (Space Problem)
Trigger: Cache is full
Solution: Remove least important data to free up space
Criteria: Usage patterns (LRU, LFU, etc.)
Example: Refrigerator is full → Remove eggs to add apples
Invalidation (Freshness Problem)
Trigger: Data updated in database
Solution: Mark cached data as invalid/stale
Criteria: Data consistency requirements
Example: Rotten eggs in fridge → Throw them out immediately
Both problems exist simultaneously. Any production cache system must implement both eviction and invalidation strategies.
The House Analogy 🏠
Eviction (Not Important):
- Tenant stops paying rent → Landlord evicts them (space needed for paying tenant)
Invalidation (No Longer Valid):
- Tenant dies or moves away → Remove from rent agreement (data is invalid)
When to Use Alternatives to LRU 🤔
Rarely. If you've optimized everything else and still need performance gains:
- Implement all policies (LRU, FIFO, LFU, MRU)
- Benchmark with production traffic
- Measure cache hit rates for each policy
- Choose based on empirical data
Never assume which policy will work best based on theory alone. Real-world access patterns often defy expectations.
Why LRU Wins 🏆
Temporal locality is universal:
- Recently accessed data is likely to be accessed again
- Works well for most application patterns
- Simple to implement and understand
- Predictable performance
Edge cases where LRU might not be optimal:
- Circular access patterns (A, B, C, A, B, C...)
- Sequential scans of large datasets
- Purely random access patterns
But even in these cases: LRU is "good enough" for 99% of applications.
Next: Cache invalidation policies — TTL, lazy deletion, and consistency models.