Server Infrastructure and Database Design at Scale

March 5, 20263 min read
system designhigh level designHLDdistributed systemsscalabilitymicroservicesload balancingcachingdatabase designAPI designsoftware architecture

Dedicated Server Requirements 🖥️

The Personal Device Problem

Issue: Using personal computers as servers introduces critical limitations:

  • ❌ Downtime: Machine shutdowns = service outages
  • ❌ Resource contention: Personal use (gaming, downloads) impacts server performance
  • ❌ Bandwidth sharing: Other activities degrade service response times

Solution: Dedicated server infrastructure

Characteristics of dedicated servers:

  • ✅ 24/7 uptime (no sleep states, no shutdowns)
  • ✅ Single-purpose allocation (exclusively serves application traffic)
  • ✅ Dedicated resources (all RAM, CPU, bandwidth for service)
ℹ️ Principle: Personal computing devices and production servers are architecturally incompatible roles.

Database Schema Design: MVP Approach 🗄️

Core Requirements

For the bookmarking service MVP:

  • User authentication (register/login)
  • Store bookmarks
  • Retrieve user's bookmarks

Minimal Schema

Table: users

  • id (integer, 8 bytes)
  • username (string)
  • password (string)

Table: user_bookmarks

  • user_id (integer, 8 bytes, foreign key)
  • url (string, VARCHAR(1000))

Excluded Features (Beyond MVP Scope)

  • ❌ Bookmark titles
  • ❌ Favicons
  • ❌ Folder structure
  • ❌ Tags/categories
  • ❌ Metadata
ℹ️ MVP principle: Store only essential data user identity and URL. Additional features add complexity without validating core functionality.

Data Type Selection: Integer Size Matters 💥

Why 8-Byte Integers?

Integer capacity by size:

  • 1 byte: 0 to 255
  • 4 bytes (32-bit): 0 to ~4 billion (unsigned)
  • 8 bytes (64-bit): 0 to ~18 quintillion

Scale consideration:

  • Global internet users: ~5 billion
  • 4-byte integers: insufficient (max ~4 billion)
  • Conclusion: Minimum 8 bytes required for user IDs

Case Study: YouTube Integer Overflow

The Gangnam Style Incident (2012):

Setup:

  • YouTube used 4-byte signed integers for view counts
  • Maximum value: ~2.147 billion

What happened:

  • "Gangnam Style" exceeded 2.147 billion views
  • Integer overflowed
  • Result: View count displayed as negative

Impact: Public-facing bug on the world's most-viewed video

Resolution: YouTube migrated to 8-byte integers system-wide

🚨 Data type selection has real production consequences at scale. Seemingly minor decisions (4 vs 8 bytes) create architectural constraints that become visible under load.

Storage Calculations 📊

Per-Record Storage

Bookmark record:

  • user_id: 8 bytes
  • url: 1,000 bytes (VARCHAR(1000))
  • Total: ~1,008 bytes ≈ 1 KB

URL size justification: URLs can be extensive (query parameters, tracking tokens, session data). Example: Amazon product search URLs routinely exceed 200 characters. 1,000 bytes provides safety margin for worst-case scenarios.

Daily Growth Rate

Traffic assumption: 1 million new bookmarks/day

Calculation:

1 KB/bookmark × 1 million bookmarks = ? 1 KB = 10³ bytes 1 million = 10⁶ bookmarks 10³ × 10⁶ = 10⁹ bytes = 1 GB

Result: 1 GB of new data daily

Long-Term Growth Projection

  • Daily: 1 GB
  • Weekly: 7 GB
  • Monthly: ~30 GB
  • Yearly: 365 GB

Hardware constraint context (2003):

  • Consumer hard drives: ~40 GB
  • Timeline to capacity: ~40 days

Implication: Storage capacity becomes a limiting factor within weeks of viral growth.

Storage Unit Conversions 🧮

Powers of 10 (SI units):

  • Kilo (K): 10³ = 1,000
  • Mega (M): 10⁶ = 1,000,000
  • Giga (G): 10⁹ = 1,000,000,000
  • Tera (T): 10¹²
  • Peta (P): 10¹⁵
  • Exa (E): 10¹⁸

Conversion trick: When multiplying powers of 10, add exponents

Example:

10³ × 10⁶ = 10⁽³⁺⁶⁾ = 10⁹

Therefore: Kilobytes × Million = Gigabytes


The Scaling Crisis 😱

Growth trajectory:

  • Day 40: Hard drive capacity exhausted
  • Day 365: 365 GB accumulated (requires enterprise storage)

Unaddressed concerns:

  • ⚡ Query performance degradation as database grows
  • 🔥 Server capacity under concurrent user load
  • 🌊 Network bandwidth constraints
  • 💥 Latency increase as data volume scales

The fundamental problem: Single-server architecture with consumer hardware cannot sustain viral-scale growth. Storage is the first bottleneck, but not the only one.


Key Takeaways 💡

  1. Dedicated servers are essential for production systems.
  2. Data type selection has scalability implications.
  3. Storage growth is predictable through calculation.
  4. MVP database schemas should be minimal.
  5. Single-server architectures have hard limits.