Write-Back Cache Strategy: High-Throughput Systems with Acceptable Data Loss

What is Write-Back Cache? 🔄

Definition: Writes go to the cache first, then asynchronously flush to the database later.

Flow:

Client sends write request
App server writes to cache (fast)
App server returns success to client immediately
Cache periodically dumps data to database (background process)

Key characteristic: Database writes are deferred (not immediate).

When to Use Write-Back Cache ✅

Ideal for: High-throughput systems where occasional data loss is acceptable.

Two Key Requirements

Extremely high write volume — Millions of writes per second
1-2% data loss tolerance — Losing some data is not catastrophic

Use Case 1: View Counting on YouTube/Twitter 👁️

The Problem

Scenario: Billions of users watching videos on YouTube.

Challenge:

Millions of view requests per second
Writing each view to disk (database) is too slow
Need extremely high throughput

The Write-Back Solution

Implementation:

User watches a video → Increment view counter in cache (RAM)
Cache accumulates views in memory
Every 30 seconds, dump aggregated count to database
If cache crashes before dump, some views are lost

Example:

Cache: Video X views = 1,523,891 (in memory)
Database: Video X views = 1,520,000 (last dump 30 seconds ago)

[Cache crashes before next dump]

Result: ~3,891 views lost

Why This Is Acceptable

Question: Is losing 1-2% of views catastrophic?

Answer: No, because we care about trends, not individual data points.

Rationale:

Overall trend remains accurate (millions of views)
Losing 3,891 out of 1,520,000 views = 0.25% error
Nobody notices or cares about this margin
Benefit: Massively higher throughput (can handle millions of requests/second)

Server Crash Frequency

How often do servers crash? Approximately once per year (for well-maintained systems).

Impact: If cache dumps to database every 30 seconds, and the server crashes once per year, you lose at most 30 seconds of view data — negligible in the grand scheme.

Use Case 2: Multiplayer Gaming (PUBG, CS:GO, Dota) 🎮

The Problem

Scenario: 10-100 players in a live game match.

Challenge:

Real-time game state (player positions, health, ammo, kills)
Extremely high write frequency (every player action)
Writing every action to database is too slow

The Write-Back Solution

Architecture:

10 Players → App Server (local cache stores game state) → Database

Implementation:

All players in a match connect to one app server
App server stores entire game state in local cache (RAM)
During the match: All updates stay in cache (no database writes)
Match ends successfully: Dump final statistics to database
- Winner
- Kill count per player
- Death count
- MVP
- Cheater detection results
Server crashes mid-match: Game is lost (cannot resume)

Why This Is Acceptable

Question: If the server crashes and all game data is lost, isn't that bad?

Answer: Yes, but it's acceptable for this use case.

Rationale:

Server crashes are rare (once per year)
If a crash happens, players simply start a new game
Users accept this risk in exchange for smooth, lag-free gameplay
Alternative (writing every action to database) would make the game unplayable due to latency

Not stored in database during match:

Every bullet fired
Every player movement
Every health/ammo change

Stored in database after match:

Final game metadata (winner, kills, deaths, MVP)
Game recording (for replay/review)

Write-Back Cache: Key Takeaways 🎯

When to Use

High-throughput systems (millions of writes/second)
Data loss of 1-2% is acceptable
Trend accuracy matters more than individual precision

Examples:

View counts (YouTube, Twitter, Instagram)
Like counts (social media)
Analytics dashboards
Gaming sessions
Leaderboards (with periodic updates)

When NOT to Use

Financial transactions (bank transfers, payments)
Critical user data (account passwords, personal information)
Legal records (contracts, compliance data)
Medical records (patient data)
Any system where data loss is unacceptable

Trade-off Summary

Gain:

Massive throughput increase (10-100x faster writes)
Near-zero write latency for users

Cost:

Risk of data loss on cache failure
Eventual consistency (data may be stale until next flush)

Advanced Topic: Bloom Filters for Non-Existent Keys 🔍

Problem: What if users frequently request keys that don't exist in the database?

Scenario:

User requests non-existent key
Cache miss → Check database
Database doesn't have the key either
Return empty result
This happens repeatedly → Database gets overwhelmed with useless queries

Solution: Bloom filters provide a fast containment check before querying the database.

How Bloom filters work:

Probabilistic data structure
Quickly answers: "Is this key definitely NOT in the database?"
If Bloom filter says "not present" → Skip database query entirely
If Bloom filter says "maybe present" → Check database

Note: Bloom filters will be covered in detail during the NoSQL internals lecture. For now, understand they optimize cache misses for non-existent keys.

Next: CDN infrastructure deep dive — ISP partnerships and redundancy architecture.