High-Level Design: Understanding Scale and Building MVPs
What is High-Level Design? π―
HLD focuses on building systems that handle massive scale specifically, systems serving billions of users with petabytes of data.
The HLD Approach: Architecture Over Implementation ποΈ
HLD cannot be practiced at true scale:
Why we can't "just build it":
- Cost: Building billion-user infrastructure costs billions of dollars
- Time: Load testing at scale requires years
- Access: Most engineers never work at Google/Meta scale
What we do instead:
- Study real-world case studies
- Analyze architectural patterns
- Design solutions conceptually
- Reason about trade-offs
When asked to "design Twitter," you're not coding Twitter. You're architecting:
- Feature requirements and constraints
- Backend infrastructure patterns
- Scaling strategies for millions of concurrent users
- Data flow and storage architecture
Case Study: The Sorting Problem π
The Deceptively Simple Question
Problem: Given a file containing strings, sort them in dictionary order.
Input:
zebra
apple
bananaExpected Output:
apple
banana
zebraThe Naive Solution
From a DSA perspective, this is trivial:
with open('data.txt', 'r') as file:
lines = file.readlines()
sorted_data = sorted(lines)Three lines. Built-in sorting. Problem solved... right?
The Scale Constraint π₯
The actual requirement: The file contains 50 petabytes of data.
Understanding petabyte scale:
- 1 KB = 10Β³ bytes
- 1 MB = 10βΆ bytes
- 1 GB = 10βΉ bytes
- 1 TB = 10ΒΉΒ² bytes
- 1 PB = 10ΒΉβ΅ bytes
50 petabytes = 50,000,000 gigabytes
Note on units: 1000 bytes = 1 kilobyte (KB) - SI standard; 1024 bytes = 1 kibibyte (KiB) - binary standard. Industry often uses these interchangeably, though they're technically different.
Why the Naive Solution Fails β
lines = file.readlines() # Attempts to load entire file into RAMPhysical constraints:
- RAM limitation: High-end servers have ~1-2 TB RAM maximum
- Storage limitation: Consumer drives max at ~20 TB; enterprise drives ~100 TB
- 50 PB cannot fit on a single machine
Where is this data? Distributed across millions of servers globally.
The problem is now:
- Collect data from distributed servers
- Sort across the entire dataset
- Store results back
This is no longer an algorithmic problem. It's a distributed systems problem.
Distributed Systems: Failure Modes β οΈ
When solving problems across distributed infrastructure, multiple failure scenarios emerge:
Common Failure Modes:
- π Network failures (partitions, latency spikes, packet loss)
- π» Node crashes or malicious behavior
- βοΈ Hardware heterogeneity (different capabilities across nodes)
- π₯οΈ Software inconsistencies (OS versions, runtime environments)
- π Data corruption in transit or at rest
- πΎ Persistent storage failures
- β οΈ Partial failures (subset of nodes produce incorrect results)
The Challenge: Despite these failure modes, the system must complete tasks efficiently and correctly. This requires fault-tolerant design patterns, redundancy, and consensus mechanisms.
Scale as a Design Driver π―
Core Principle: Simple problems become challenging at scale.
High-Level Design focuses on understanding:
- Scale transitions: 10Β² users β 10βΉ users
- Challenge identification: What breaks when scale increases by orders of magnitude
- Architectural solutions: Design patterns that handle planetary-scale problems
Scale dimensions:
- π Data volume (petabytes, exabytes)
- β‘ Request throughput (millions/billions per second)
At small scale (10Β³ requests), single-server architectures work fine. At internet scale (10βΉ+ requests), the same design collapses.
Always design for n+2 orders of magnitude growth.
MVP: Minimum Viable Product π οΈ
Definition:
- Minimum: Fewest features required
- Viable: Actually solves the problem
- Product: Demonstrates the solution
Features vs. Implementation π‘
Critical distinction:
- Features: What the user experiences (user-facing functionality)
- Implementation: How you technically build it (databases, APIs, algorithms)
Example:
- β "We need a database" β This is implementation
- β "Users can save bookmarks" β This is a feature
Case Study: Delicious Bookmarking Service π
The Problem (Pre-Cloud Era)
In 2003, before cloud computing existed:
- Browsers saved bookmarks locally on individual machines
- No synchronization across devices
- Users at cyber cafΓ©s lost bookmarks when switching computers
- Research and saved links were trapped on specific hardware
The Solution: Centralized Bookmark Storage βοΈ
Build a web service where users can:
- Store bookmarks on a remote server
- Access them from any computer
- Maintain persistence across sessions
MVP Feature Set β
Core Features (Must Have):
- User registration and authentication
- Add bookmark (URL + title)
- View saved bookmarks
Excluded from MVP (Can Add Later):
- β Logout functionality
- β Delete bookmarks
- β Update/edit bookmarks
- β Automatic title detection
- β Thumbnail previews
- β Tags or categories
- β Search functionality
From Local to Distributed: The Architecture Shift π
The Local Development Problem
Initial implementation:
http://127.0.0.1:8080 (localhost)The application runs on a single machine. It works perfectly for the developer but is inaccessible to external users.
The fundamental challenge: How do we make a local application accessible globally?
Internet Connectivity Basics π‘
Requirements for global access:
ISP (Internet Service Provider): Provides internet connectivity via physical infrastructure (fiber/copper cables β router β device)
IP Address: Every internet-connected device receives a unique identifier
- IPv4 or IPv6
- Static or dynamic allocation
- Enables device-to-device communication
Network Path: ISPs route traffic between devices across the global internet infrastructure
The Architecture Transition
Local Architecture:
Developer Machine β Localhost Server β Local BrowserDistributed Architecture:
Client (anywhere) β Internet β Public Server β ApplicationThis transition requires:
- Public IP address or domain name
- Server infrastructure (cloud or physical)
- Network configuration (ports, firewalls, load balancers)
- Security considerations (authentication, encryption)
The shift from local to distributed introduces all the failure modes discussed earlier, making system design critical for reliability.
Key Takeaways π‘
- Simple problems become challenging at scale.
- MVP focuses on features, not implementation.
- Distributed systems introduce complexity.
- Always consider n+2 orders of magnitude growth.