Meta's System Design loop skews "product-shaped" relative to other big-tech bars — generic prompts ("design a search engine") are rare, while questions built on Meta's own products (News Feed, Stories, Live Comments, Notification, Messenger) dominate. This guide walks through the three most common shapes, plugs them into Meta's eight-step framework, and flags the moves that drop a candidate to lean no hire.
Meta's Eight Steps (quick recap)
- Clarify requirements (DAU / QPS / read-write ratio)
- API design
- Data model
- High-level architecture
- Storage choices (MySQL / Cassandra / Memcached / TAO)
- Walk through a critical path
- Tradeoffs and bottlenecks
- Monitoring + recovery
Plugged into three questions below.
Problem 1: Design News Feed
Clarify
- DAU: ~3B
- Average followers: ~300
- High-fanout users: ~10M followers
- Reads : writes ≈ 100:1 at peak
- Consistency: weakly consistent acceptable; new posts visible within 1–3 seconds
API
POST /feed/post {user_id, content, media[]}
GET /feed?user_id=&cursor=
Data Model
| Entity | Key fields | Storage |
|---|---|---|
| Post | post_id, author_id, content, ts, media_url | TAO + Blob |
| Follow | follower_id, followee_id, ts | TAO |
| Feed | user_id, post_id, score, ts | Memcached + offline store |
High-Level Architecture
client → API GW → Feed Service → [Cache: Memcached]
↓
Feed Builder (offline)
↑
Post Service → TAO + Blob
Follow Service → TAO
Core Tradeoff: Fan-out on Write vs Read
| Mode | Best for | Drawback |
|---|---|---|
| Fan-out on write (push) | Regular users | One celebrity post fans out to 10M+ recipients |
| Fan-out on read (pull) | High-fanout users | Higher read latency |
| Hybrid | What Meta actually runs | Adds complexity, but balances both |
Hybrid rule: users above a threshold (e.g. 1M followers) go pull, everyone else goes push.
Details You Must Land
- Memcached lease get / lease set to dodge thundering herd
- mcrouter for routing + replication
- TAO's read-after-write consistency: master region + read-through cache
- Ranking: EdgeRank (recency × affinity × weight)
Easy Traps
- Talking only push, never push/pull/hybrid
- Saying "Redis" when you mean Memcached (Meta doesn't run Redis as the main cache)
- Skipping hot keys (celebrity posts)
Problem 2: Design Live Comments
The real-time comment stream under a Facebook Live or Instagram Live broadcast.
Clarify
- Concurrent viewers on a single broadcast: 10M+
- Comment QPS peaks: 100K+ per broadcast
- End-to-end latency: comment visible to viewers in < 1 second
- Persistence requirement low — not every comment needs forever storage
Transport Choice
| Option | Pro | Con |
|---|---|---|
| Polling | Simple | Latency + bandwidth waste |
| Long polling | Bandwidth-friendly | High server connection count |
| WebSocket | True bidirectional, low latency | Connection management + reconnect logic |
| Server-Sent Events | Unidirectional simplicity | No binary support |
Meta's actual stack: WebSocket plus an MQTT-like protocol.
High-Level Architecture
client ⇄ Edge WS Server (region) ⇄ Pub/Sub (Kafka-like)
↓
Comment Service → Hot Storage (Memcached/Redis)
↓
Sampling / ML → Cold Storage
Key Design Calls
- Sharding: hash by
live_idinto separate pub/sub channels - Backpressure: when comment rate exceeds threshold, downsample silently
- Reconnect: client resumes via cursor after a drop
Easy Traps
- Falling back to HTTP polling without acknowledging WebSocket
- Persisting every comment to MySQL (the math doesn't work)
- Skipping spam / rate limiting
Problem 3: Design Notification System
Unified system for in-app, push (APNs/FCM), and email notifications: likes, comments, mentions, friend requests.
Clarify
- Types: in-app, push, email
- User preferences: each type mutable / silenceable
- Volume: ~10B notifications per day
- Deduplication + aggregation (don't send 10 separate notes when 10 people like the same post)
High-Level Architecture
Event source (post like/comment) → Notification Producer → Kafka
↓
Notification Service (workers)
↓
┌───────────────────┬────────────────────┐
APNs/FCM Email Service In-app store
Critical Choices
| Dimension | Design |
|---|---|
| Dedup | Window-based: collapse (target_user, event_type, source_id) within 5 minutes |
| Aggregate | "X, Y and 8 others liked your post" |
| Preferences | Check preference cache before the Notification Service |
| Rate limiting | Per target_user_id, prevent flooding |
| Retry | Exponential backoff, dead-letter queue beyond threshold |
Easy Traps
- Skipping dedup / aggregation (Meta weighs this heavily)
- Ignoring APNs rate limits (high-frequency sends temporarily ban a device token)
- Forgetting user preference check
Three Moves That Score in Any Problem
- Draw a clean architecture — 5 boxes, not 30 components
- Walk through a critical path end-to-end at least once
- State tradeoffs explicitly ("X is faster but uses more memory")
A Real Strong-Hire Pattern
Our students who got Meta SD strong-hire share one trait: they cover 4–6 explicit tradeoffs + at least one deep critical-path walkthrough within 45 minutes. Our VO assistance flow runs problem-by-problem mocks with recording playback and explicit hire / no-hire flagging.
For pricing and slots, ping WeChat Coding0201.
FAQ
Are Meta SD questions always Meta products?
About 80%. The other 20% are generic (chat, URL shortener). Even on generic prompts, interviewers like to compare against Meta's real stack.
What if I don't know Memcached well?
At minimum, articulate Memcached vs Redis tradeoffs: Memcached is multi-threaded, pure LRU, no persistence; Redis is single-threaded core, richer data structures, multiple persistence modes.
Do I draw, or does the interviewer?
You draw. Meta SD interviewers hand you a whiteboard tool (Excalidraw or an internal equivalent) — you're expected to use it.
Mid-level vs Senior SD difference?
Mid (E4) needs high-level + one solid tradeoff to clear. Senior (E5+) must go deep on storage selection, capacity estimation, and failure modes.
Preparing for Meta, Google, or Amazon system design rounds?
oavoservice continuously tracks system design questions and scoring rubrics at top firms. Mentors are front-line Staff / Senior SWEs and can provide Meta-product specials, eight-step framework mocks, storage-selection training, and capacity-estimation drilling as VO assistance.
👉 Add WeChat: Coding0201 — Get the Meta System Design prep package.
Contact
Email: [email protected]
Telegram: @OAVOProxy