Meta System Design Interview Deep Dive | News Feed / Live Comments / Notification Templates That Pass

Meta's System Design loop skews "product-shaped" relative to other big-tech bars — generic prompts ("design a search engine") are rare, while questions built on Meta's own products (News Feed, Stories, Live Comments, Notification, Messenger) dominate. This guide walks through the three most common shapes, plugs them into Meta's eight-step framework, and flags the moves that drop a candidate to lean no hire.

Meta's Eight Steps (quick recap)

Clarify requirements (DAU / QPS / read-write ratio)
API design
Data model
High-level architecture
Storage choices (MySQL / Cassandra / Memcached / TAO)
Walk through a critical path
Tradeoffs and bottlenecks
Monitoring + recovery

Plugged into three questions below.

Problem 1: Design News Feed

Clarify

DAU: ~3B
Average followers: ~300
High-fanout users: ~10M followers
Reads : writes ≈ 100:1 at peak
Consistency: weakly consistent acceptable; new posts visible within 1–3 seconds

API

POST /feed/post     {user_id, content, media[]}
GET  /feed?user_id=&cursor=

Data Model

Entity	Key fields	Storage
Post	post_id, author_id, content, ts, media_url	TAO + Blob
Follow	follower_id, followee_id, ts	TAO
Feed	user_id, post_id, score, ts	Memcached + offline store

High-Level Architecture

client → API GW → Feed Service → [Cache: Memcached]
                                ↓
                       Feed Builder (offline)
                                ↑
                       Post Service → TAO + Blob
                       Follow Service → TAO

Core Tradeoff: Fan-out on Write vs Read

Mode	Best for	Drawback
Fan-out on write (push)	Regular users	One celebrity post fans out to 10M+ recipients
Fan-out on read (pull)	High-fanout users	Higher read latency
Hybrid	What Meta actually runs	Adds complexity, but balances both

Hybrid rule: users above a threshold (e.g. 1M followers) go pull, everyone else goes push.

Details You Must Land

Memcached lease get / lease set to dodge thundering herd
mcrouter for routing + replication
TAO's read-after-write consistency: master region + read-through cache
Ranking: EdgeRank (recency × affinity × weight)

Easy Traps

Talking only push, never push/pull/hybrid
Saying "Redis" when you mean Memcached (Meta doesn't run Redis as the main cache)
Skipping hot keys (celebrity posts)

Problem 2: Design Live Comments

The real-time comment stream under a Facebook Live or Instagram Live broadcast.

Clarify

Concurrent viewers on a single broadcast: 10M+
Comment QPS peaks: 100K+ per broadcast
End-to-end latency: comment visible to viewers in < 1 second
Persistence requirement low — not every comment needs forever storage

Transport Choice

Option	Pro	Con
Polling	Simple	Latency + bandwidth waste
Long polling	Bandwidth-friendly	High server connection count
WebSocket	True bidirectional, low latency	Connection management + reconnect logic
Server-Sent Events	Unidirectional simplicity	No binary support

Meta's actual stack: WebSocket plus an MQTT-like protocol.

High-Level Architecture

client ⇄ Edge WS Server (region) ⇄ Pub/Sub (Kafka-like)
                                          ↓
                            Comment Service → Hot Storage (Memcached/Redis)
                                          ↓
                            Sampling / ML → Cold Storage

Key Design Calls

Sharding: hash by live_id into separate pub/sub channels
Backpressure: when comment rate exceeds threshold, downsample silently
Reconnect: client resumes via cursor after a drop

Easy Traps

Falling back to HTTP polling without acknowledging WebSocket
Persisting every comment to MySQL (the math doesn't work)
Skipping spam / rate limiting

Problem 3: Design Notification System

Unified system for in-app, push (APNs/FCM), and email notifications: likes, comments, mentions, friend requests.

Clarify

Types: in-app, push, email
User preferences: each type mutable / silenceable
Volume: ~10B notifications per day
Deduplication + aggregation (don't send 10 separate notes when 10 people like the same post)

High-Level Architecture

Event source (post like/comment) → Notification Producer → Kafka
                                                            ↓
                                       Notification Service (workers)
                                                            ↓
                            ┌───────────────────┬────────────────────┐
                       APNs/FCM             Email Service        In-app store

Critical Choices

Dimension	Design
Dedup	Window-based: collapse (target_user, event_type, source_id) within 5 minutes
Aggregate	"X, Y and 8 others liked your post"
Preferences	Check preference cache before the Notification Service
Rate limiting	Per target_user_id, prevent flooding
Retry	Exponential backoff, dead-letter queue beyond threshold

Easy Traps

Skipping dedup / aggregation (Meta weighs this heavily)
Ignoring APNs rate limits (high-frequency sends temporarily ban a device token)
Forgetting user preference check

Three Moves That Score in Any Problem

Draw a clean architecture — 5 boxes, not 30 components
Walk through a critical path end-to-end at least once
State tradeoffs explicitly ("X is faster but uses more memory")

A Real Strong-Hire Pattern

Our students who got Meta SD strong-hire share one trait: they cover 4–6 explicit tradeoffs + at least one deep critical-path walkthrough within 45 minutes. Our VO assistance flow runs problem-by-problem mocks with recording playback and explicit hire / no-hire flagging.

For pricing and slots, ping WeChat Coding0201.

FAQ

Are Meta SD questions always Meta products?

About 80%. The other 20% are generic (chat, URL shortener). Even on generic prompts, interviewers like to compare against Meta's real stack.

What if I don't know Memcached well?

At minimum, articulate Memcached vs Redis tradeoffs: Memcached is multi-threaded, pure LRU, no persistence; Redis is single-threaded core, richer data structures, multiple persistence modes.

Do I draw, or does the interviewer?

You draw. Meta SD interviewers hand you a whiteboard tool (Excalidraw or an internal equivalent) — you're expected to use it.

Mid-level vs Senior SD difference?

Mid (E4) needs high-level + one solid tradeoff to clear. Senior (E5+) must go deep on storage selection, capacity estimation, and failure modes.

Preparing for Meta, Google, or Amazon system design rounds?

oavoservice continuously tracks system design questions and scoring rubrics at top firms. Mentors are front-line Staff / Senior SWEs and can provide Meta-product specials, eight-step framework mocks, storage-selection training, and capacity-estimation drilling as VO assistance.

👉 Add WeChat: Coding0201 — Get the Meta System Design prep package.

Contact

Email: [email protected]
Telegram: @OAVOProxy