System Design Interview

Design a Twitter / X News Feed

Simple to describe, impossible to scale — the fan-out problem is the heart of every social feed interview.

L3 · Working SystemL5 · TradeoffsL7 · Architecture Ownership

~35 min read

A stick figure shouting into a megaphone at a massive crowd of followers, illustrating the fan-out problem

What the interviewer is testing

The news feed question is a proxy for one real question: do you understand write amplification?Any system that must broadcast one event to many recipients has to decide whether to do work at write time or at read time. The tradeoffs are not obvious, and they change depending on the data distribution — specifically, whether users follow power-law distributions (a handful of accounts with tens of millions of followers dominating the write path).

Three secondary signals matter here. A naive design — query the database on every feed load — simply does not work at this scale; the numbers make that obvious before any architecture is drawn. Cache consistency is non-trivial in a system with two independent write paths (fan-out writes timeline IDs, engagement writes update tweet objects); recognising that the two caches must be invalidated separately is a design smell test. And at L5+, ranking is the difference between a feed that feels alive and one that feels like a log dump — the system must be built to support re-scoring even if the initial design is chronological.

Level	The core question	What gets you to the next level
L3/L4	Can you design a working feed with basic fan-out?	Identifying the celebrity write-amplification problem before the interviewer does
L5	Can you articulate the push/pull tradeoff and the hybrid?	Discussing ranking signals and the cost of algorithmic re-ranking at scale
L6	Can you own the whole design including observability?	Addressing cross-region replication, backfill on follow, and graceful degradation
L7/L8	Should we build this, and what does the business optimise for?	Framing the ranking objective function and its business consequences

Requirements clarification

Before drawing a single box, pin down scope. A news feed interview can go many directions — live sports scores, Instagram-style media, LinkedIn professional posts — and each changes the design materially.

Functional requirements

Requirement	In scope	Out of scope (today)
Post a tweet (text, image, video link)	✓	Live video streaming
Home timeline (aggregated feed of followees)	✓	Ads insertion / sponsored content
User timeline (profile page, own tweets)	✓	DMs, Spaces
Follow / unfollow	✓	Lists, algorithmic discovery
Like, retweet, reply counts	✓ (eventual)	Full comment threading
Real-time new tweet notification	✓ (best effort)	Push notification delivery guarantees

Non-functional requirements

NFR	Target	Why this number
Feed load latency (p99)	< 200 ms	User perception threshold; beyond this, engagement drops measurably
Feed freshness	< 5 s lag after post (steady-state median)	Social context; seeing a 30-second-old tweet during a live event feels broken. Applies in steady state — celebrity bursts and autoscaling windows will occasionally exceed it.
Availability	99.99%	~52 min/year downtime; social feeds are not life-critical but user expectation is "always on" — which means Redis and API failover must complete within 30 s to stay within the annual budget.
Consistency (engagement counts)	Eventual, ≤ 10 s	Like counts are not transactional; approximate counts are acceptable and expected
Durability (tweets)	99.999999% (8 nines)	Losing a tweet is a trust-destroying user-visible event
Write scalability	100 K tweets/s burst	Live events (World Cup, elections) spike the write path by 50–100×

Why 200 ms for feed load?›

200 ms is the threshold at which users consciously perceive a delay as "the app is loading." Below it, the feed feels instant. Above it, engagement starts dropping — Twitter's own research cited double-digit session-time reductions when median feed latency exceeded 300 ms.

What this drivesPre-computed timelines stored in an in-memory cache (§8). Assembling a feed by querying a database at request time — even with indexes — cannot reliably hit 200 ms p99 at scale without caching.

Why eventual consistency for like counts?›

Like counts are social signals, not financial ledgers. If two users see 1, 402 vs 1, 403 likes simultaneously, neither is harmed. Enforcing strong consistency on every like would require distributed locking on every engagement event — an extreme cost for no real user benefit.

What this drivesCounter updates via a message queue (Kafka) and a counter aggregation service (§4). Counts are written to an eventually-consistent store and periodically flushed. This decouples engagement volume from tweet-read latency.

Why 5 s freshness?›

Five seconds is a practical boundary between "real-time" and "delayed" in a social context. During live events, even 10–15 s lag can mean users see spoilers elsewhere first and lose trust in the platform as a live information source. This is a steady-state median target, not a worst-case bound — celebrity-tweet bursts and autoscaling provisioning windows (30–60 s) will occasionally exceed it, and that is expected and acceptable.

What this drivesFan-out via an asynchronous message queue with low-latency consumers (§5). Cache pre-computation must complete within the freshness budget for the typical case. This rules out lazy pull-only timelines as the sole strategy, but does not require synchronous fan-out before returning a tweet response.

Capacity estimation

Twitter-scale systems are write-amplified: one tweet may need to be placed into millions of follower timelines. The defaults below reflect a large-scale Twitter-like platform — 100M tweets per day, 200M active users — roughly comparable to Twitter's 2019–2020 scale. Adjust the sliders to match the scale you' re discussing in your interview.

Four dimensions matter here. Traffic dominates the cache sizing and fan-out throughput. Storage is driven by tweet retention and media references. The read:write ratio determines how heavily the cache must absorb reads relative to writes — on social platforms this is typically 30–100×. The critical output is fan-out QPS — the rate at which timeline entries are written when tweets arrive.

Interactive capacity estimator

Tweets posted / day (M)100 M

Avg followers per tweeter200

Feed reads / tweet (read:write ratio)30×

Avg tweet row size (bytes)600 B

Retention (years)5 yrs

Write QPS

—

tweets/sec

Fan-out QPS

—

timeline writes/sec

Read QPS

—

feed reads/sec

Tweet Storage

—

TB (tweet rows)

Timeline Cache Size

—

TB (800 IDs/user, 8B/ID)

Fixed: 200M active users

Peak Write QPS

—

tweets/sec (20× surge)

⚠️

The fan-out number is the critical output.With 200 avg followers, 100M tweets/day yields~230K timeline writes per second in steady state. A single celebrity tweet to 50M followers pushes 50M writes in seconds — roughly the entire platform's steady-state fan-out in one event. This is why the hybrid fan-out strategy (§5) exists.

High-level architecture

High-level architecture. Write path: Client → LB → Write API → Tweet Store+Kafka → Fan-out → Timeline Cache. Fan-out reads the User Graph (up, same column) and signals the Notification Gateway (dotted). Read API hits the Timeline Cache (right) then the Hydration Cache below it. Gateway pushes real-time events back to the client (red).

Component breakdown

Load Balancerroutes traffic to write and read API clusters separately. This allows independent scaling: write capacity scales with tweet volume, read capacity with user sessions.

Write APIvalidates the tweet (content, media references, rate limits), writes durably to the tweet store, then emits a fan-out event to a durable message queue (Kafka). The response returns to the client after the durable write — not after fan-out completes. Media is uploaded separately via a pre-signed object store URL (e.g. Amazon S3) before the tweet is posted; the media_idsfield carries references to already-uploaded objects, served to end users through a globally distributed CDN.

Tweet Storeis a wide-column database (Apache Cassandra) keyed on user_id+tweet_id. This access pattern — "give me all tweets by user X" — maps naturally onto Cassandra's partition model. Cassandra' s multi-datacenter replication satisfies the 8-nine durability NFR from §2.

Message Queue (Kafka)buffers tweet events between the write API and fan-out consumers. It absorbs bursty celebrity-tweet spikes, ensures at-least-once delivery to fan-out workers, and provides a replay mechanism for consumer failures.

Fan-out Serviceconsumes from Kafka, looks up the tweeter's follower list in the user graph, and writes a tweet ID reference into each follower' s pre-computed timeline in the timeline cache. Normal users trigger push fan-out; celebrities (above a follower threshold) are skipped and handled at read time (§5).

User Graphstores follow relationships. Reads are extremely hot — every fan-out needs the full follower list for a user. A graph database (or a custom adjacency-list service like Twitter's FlockDB) is optimised for this traversal. The follower list for an active user is cached in memory.

Timeline Cache (Redis) stores pre-computed home timelines as sorted lists of tweet IDs per user. Each list holds ~800 recent tweet IDs (not full objects). Reads hit this cache first; a cache miss triggers a slower rebuild from the tweet store.

Hydration Cache (Redis) stores full tweet objects, keyed by tweet ID. When the read API fetches tweet IDs from the timeline cache, it hydrates them in a parallel batch from this cache, falling back to the tweet store. This separates "what is in your feed" from "what does each tweet contain," allowing engagement counts to update without invalidating timelines.

Notification Gateway handles real-time delivery to connected clients via a persistent connection protocol (WebSocket or Server-Sent Events). When the fan-out service writes to a follower's timeline cache, it also publishes to a per-user notification topic; the gateway pushes this signal to any active client connection. Mobile clients without an active connection receive a platform push notification — APNs for iOS, FCM for Android — which wakes the app to fetch the delta from the feed service.

Architectural rationale

Why Cassandra for the tweet store? Database choice ›

The tweet store has one dominant access pattern: range scans by user_id ordered by tweet_id (time). Cassandra's partition key + clustering column model serves this exactly. Writes are append-only (tweets are immutable once published), which plays to Cassandra' s LSM-tree write path. Multi-DC replication gives geographic durability without a separate replication layer.

TradeoffCassandra sacrifices strong consistency (eventual by default). Tweet reads are idempotent and can tolerate brief staleness. But operations that require strict ordering across partitions — such as checking a global trend — are awkward.

AlternativesMySQL (Twitter's original)DynamoDBGoogle Bigtable

Why separate tweet IDs from tweet objects in cache? Cache design ›

Storing full tweet objects in the timeline cache would mean every like or retweet invalidates the timeline entries for all followers — millions of cache writes per engagement. By separating "timeline = list of IDs" from "hydration cache = tweet objects," engagement updates only touch the hydration cache. Timelines remain stable and cheap to maintain.

TradeoffEvery feed read now requires two cache lookups: one for the timeline, one for object hydration. The hydration step can be parallelised but adds a small latency overhead (~10–20 ms in practice) compared to a single lookup.

Why Kafka between Write API and Fan-out? Durability ›

Without a queue, the write API would need to synchronously fan-out to potentially millions of followers before returning a response — making tweet latency proportional to follower count. Kafka decouples the tweet write (durable, fast) from fan-out (potentially slow, bursty). It also gives fan-out workers the ability to replay events after failures, and absorbs the spike from celebrity tweets without backpressure on the write path.

TradeoffAsynchronous fan-out means followers don't see the tweet instantly — there is a propagation delay. This is typically acceptable (and necessary) but must be acknowledged in §2' s freshness NFR discussion.

AlternativesRabbitMQAWS SQS+SNSGoogle Pub/Sub

Real-world comparison

Decision	This design	Twitter (historical)	Facebook News Feed
Fan-out model	Hybrid push/pull	Hybrid (fan-out on write + celebrity pull)	Primarily fan-out on read (TAO graph)
Timeline store	Redis sorted sets	Redis lists (Flock timeline)	Ranked feed server (in-memory)
Tweet/post store	Cassandra	Manhattan (custom key-value store)	MySQL + Haystack (object storage for media)
Graph store	FlockDB-style (adjacency-list service)	FlockDB (custom adjacency-list graph store)	TAO (Facebook's graph database)
Ranking	Chronological (L3) / ML ranker (L5+)	Algorithmic (Twitter's recommendation engine)	EdgeRank → deep ML model

ℹ️

Neither approach is universally correct. Facebook's read-time fan-out works because their post rate is lower per user and their ranking computation is more complex. Twitter' s write-time fan-out works because tweet consumption vastly outpaces tweet production — pre-computing makes reads cheap. The right model follows directly from the read:write ratio established in §3.

Fan-out strategy

Fan-out is the central algorithm of a news feed: when a tweet is published, which followers see it and when? There are two extremes and a hybrid that nearly every production social platform converges on.

Three fan-out strategies. Push (①) is fast to read; pull (②) is cheap to write; hybrid (③) caps worst-case write amplification by pulling celebrity tweets at read time.

Our choice for this system is the hybrid. The 200 ms latency NFR from §2 rules out pure pull (assembling feeds from 1, 000 followees at read time cannot reliably hit 200 ms p99). The fan-out write capacity NFR rules out pure push for accounts with millions of followers. The hybrid threads both constraints.

🎯

Classic follow-up: "What happens when a normal user follows a celebrity?" The follow event triggers a backfill: the system fetches the celebrity's recent N tweets and injects their IDs into the new follower' s timeline cache. This is a background operation, gated on a queue, and must be idempotent (safe to re-run if it fails midway).

Hybrid threshold selection — how to tune it ›

The threshold is not a fixed number — it is a function of fan-out worker cluster capacity. At 100K fan-out writes/second, a celebrity with 10M followers would take 100 seconds to propagate fully under pure push. Setting the threshold at 1M followers caps the worst-case single-event fan-out at ~10 seconds. That still exceeds the 5 s freshness NFR, which is why that target must be treated as a steady-state median rather than a hard bound.

Engineering realityIn production, a tiered approach gives finer control: push for accounts under 100K followers, throttled push for 100K–1M, and pure pull above 1M. The threshold is a runtime-configurable flag on the user graph entry, not a code change. A tiered model lets you tune freshness vs infrastructure cost per follower-count bracket.

05b

API design

Two core endpoints drive the system. Both carry implicit authentication via a JWT or session token validated at the API gateway layer before reaching these services.

POST /tweets — Create a tweet

 // Request
                POST /v1/tweets Authorization: Bearer < token>

                Content-Type: application/json {
                    "text" : "Hello world — 280 chars max" ,
                    "media_ids" : ["m_abc123" ],  // pre-uploaded media references
                    "reply_to_tweet_id" : null,  // null if top-level tweet
                    "idempotency_key" : "uuid-v4"   // client-generated; prevents duplicate posts
                }

                 // Response — 201 Created

                    {
                    "tweet_id" : "1843927401234567168" ,
                    "created_at" : "2026-03-29T10:00:00Z" ,
                    "author_id" : "u_42" ,
                    "fan_out_status" : "queued"   // fan-out is async; not yet delivered
                }

Validation rules: text must be 1–280 characters; media_ids must reference pre-uploaded objects (not inline uploads); idempotency_key deduplication window is 24 hours. Rate limiting is enforced at the API gateway: 300 tweets per 3-hour rolling window per user, counted via a sliding window counter in Redis.

GET /feed — Home timeline

 // Request
                GET /v1/feed?limit=20& cursor=eyJsYXN0X3R3ZWV0X2lkIjoiMTg0MDAwIn0 Authorization: Bearer < token>

                 // Response — 200 OK

                    {
                    "tweets" : [ {

                        "tweet_id" : "1843927401234567168" ,
                        "text" : "Hello world" ,
                        "author" : {
                            "user_id" : "u_42" , "handle" : "@alice" 
                        }

                        ,
                        "created_at" : "2026-03-29T10:00:00Z" ,
                        "engagement" : {
                            "likes" : 128, "retweets" : 34
                        }

                        ,
                        "media" : []
                    }

                     // ... up to limit items
                    ],
                    "next_cursor" : "eyJsYXN0X3R3ZWV0X2lkIjoiMTgzOTAwIn0" ,
                    "has_more" : true
                }

Cursor-based pagination (not offset) is mandatory here — offset pagination would require counting rows from the beginning of the timeline on every page, which is expensive on distributed caches. The cursor encodes the last seen tweet_id; the feed service continues from that point.

Endpoint	Level	Notes
GET /users/:id/tweets	L3/L4	User's own timeline; served directly from tweet store
DELETE /tweets/:id	L3/L4	Soft-delete; triggers async cache invalidation across fan-out timelines
POST /tweets/:id/like	L5	Idempotent; counter update via async event; returns optimistic count
GET /feed/delta?since=tweet_id	L5	Returns new tweets since last poll — used by clients for real-time updates
GET /feed?algorithm=ranked	L7/L8	Toggle between chronological and ML-ranked feed

Core read flow

The latency NFR from §2 (200 ms p99) is determined almost entirely by the read path — specifically by the two sequential cache lookups (timeline IDs, then tweet objects). The write path is comparatively relaxed; a 500 ms write is acceptable. The read path is where the architecture either succeeds or fails at scale.

Feed read flow: timeline IDs fetched from Redis, celebrity tweets merged on the fly (hybrid fan-out), tweet objects hydrated in parallel, sorted and returned.

The key tradeoff in this flow is the cache-miss path. A cold cache (user hasn't loaded their feed in days) requires querying the tweet store for each followee' s recent tweets, merging and sorting them, then populating the cache — potentially 200–500 ms. To hit 200 ms p99, the cache-miss path must rarely execute; at a target p99, a 2% miss rate means 1-in-50 requests lands on the slow path, which is tolerable if each miss takes under 400 ms. This is why timelines are preheated for active users in the background (§8) rather than rebuilt lazily on first request.

💡

Chronological vs ranked: this flow shows chronological ordering (sort by tweet_id / created_at). At L5+, you're expected to note that reverse-chronological is a special case of ranking — and that algorithmic ranking adds a re-scoring step after hydration, using a model that evaluates engagement signals, relationship strength, and content type. This is why tweet objects (not just IDs) must be hydrated before ranking can occur.

Data model

There are four core entities: User, Tweet, Follow, and Timeline. Each is accessed very differently, which ends up shaping the storage choice per entity.

Access patterns

Operation	Frequency	Query shape
Fetch user profile	Medium (per feed hydration)	Point lookup by user_id
Fetch tweets by user	Medium (user timeline)	Range scan: user_id, last N tweets
Fetch home timeline IDs	Very high (every feed load)	Point lookup by user_id → ordered list
Fetch tweet by ID	Extremely high (hydration)	Point lookup by tweet_id
Lookup followers by user_id	High (fan-out on write)	Range scan: tweeter_id → all follower_ids
Check follow relationship	Medium (authorization)	Point lookup: (follower_id, followee_id)
Increment like/retweet count	Extremely high	Counter increment; not per-row

Two patterns stand out from this table. First, timeline ID lookups and tweet hydration dominate by volume — both are point lookups, which maps cleanly onto caches (Redis) and key-value stores. Second, follower scans are write-path-critical but read infrequently relative to timeline reads — this justifies a specialised graph store rather than a general-purpose database.

Schema

Core data model. Four entities, three storage systems: users in a relational store, tweets in Cassandra (wide-column), follow edges in a graph store, home timelines in Redis sorted sets.

Why cap the timeline cache at 800 entries? ›

The timeline cache holds tweet IDs, not full objects — each ID is 8 bytes. 800 IDs × 8 bytes=6.4 KB per user. With 200M active users, 200M × 6.4 KB=~1.3 TB total — a manageable Redis cluster size. Beyond 800 entries, users requesting older tweets would trigger a cache-miss rebuild from Cassandra, which is acceptable since infinite scroll requests for old content are rare and can tolerate higher latency.

Tradeoff800 is an approximation calibrated for "most active users" content; a user following 5, 000 accounts may burn through 800 entries in minutes if all followees tweet simultaneously. The capacity estimator in §3 can be adjusted for different active-follow distributions.

Why are like/retweet counts in a separate table? ›

Cassandra enforces a hard constraint: a table with COUNTER columns cannot contain any non-counter columns (and vice versa). Placing like_count and retweet_count in the tweets table alongside text and media_ids would cause the CREATE TABLE to fail at runtime. The correct approach is a separate tweet_counters table keyed purely on tweet_id with COUNTER columns, or — as most production systems do — a dedicated counter service (Redis INCRBY or a purpose-built counter store) that decouples engagement writes from tweet reads entirely.

Production patternTwitter historically used a separate counter service backed by a persistent store (not Cassandra) for engagement counts, since counter reads on the hot tweet hydration path need sub-millisecond latency. Redis INCRBY on a per-tweet key satisfies this; counts are periodically checkpointed to durable storage for recovery after Redis restart.

AlternativesRedis INCRBYDynamoDB atomic counterDedicated counter shard (MySQL)

Why a reverse index on follows.followee_id? ›

Without a secondary index (or reverse mapping in the graph store), this query would require a full table scan. The reverse index (keyed by followee_id) makes this lookup O(1) for the index and O(followers) for the scan — the only unavoidable cost.

Implementation noteIn practice, Twitter's FlockDB stored both directions as separate edge lists: forward (who I follow) and reverse (who follows me). Both are sharded by user_id. Hot follower lists for celebrity accounts are cached in memory on graph service nodes to avoid repeated full scans.

Caching strategy

The 200 ms p99 latency NFR from §2 is only achievable because multiple cache layers eliminate database round trips on the hot path. Each cache layer has a specific role in the §4 architecture.

Four-layer cache hierarchy. Each layer corresponds to its physical position in the §4 request path: client → CDN → Redis timeline (fan-out write target) → Redis hydration → Cassandra.

Cache layer	Stores	Invalidation	Miss fallback
Client cache	Rendered feed cards	TTL (5 min) + explicit pull-to-refresh	Full API request
CDN / edge	Static assets, avatars	TTL (24 hr)	Origin fetch + re-cache
Timeline cache (Redis)	Per-user ordered tweet ID lists	Fan-out service writes; delete on tweet removal; 7-day TTL	Rebuild from Cassandra (slow path)
Hydration cache (Redis)	Full tweet objects by tweet_id	Counter service pushes updates every ~5 s; 24-hr TTL	Direct Cassandra read

⚠️

Timeline cache invalidation on tweet deletion: when a tweet is deleted, the fan-out service must remove its ID from all follower timeline caches. For a user with 10M followers, this is 10M Redis DEL operations — effectively another fan-out event. The deletion is queued and processed asynchronously, so deleted tweets may remain visible in some followers' feeds for seconds to minutes. This is the deliberate choice: async deletion bounds the worst-case propagation time by the same fan-out throughput as normal writes, rather than blocking the deletion response on a synchronous sweep.

Deep-dive: scalability

A single-region, single-instance version of this system breaks in at least five ways as load grows. The following topics each address one of those breakage modes — they are independent of each other and can be drilled into in any order.

Production-scale architecture: every component clustered, sharded, and geo-replicated. Fan-out workers autoscale against Kafka consumer lag. Entire stack active-active across 3 regions.

Distributed tweet IDs — why not auto-increment? L5+ · ID generation ›

Auto-increment requires a central counter, which becomes a single point of failure and a write bottleneck at scale. Twitter-style systems use a distributed ID generator — Twitter's Snowflake generates 64-bit IDs encoding a timestamp (41 bits), datacenter ID (5 bits), machine ID (5 bits), and sequence (12 bits). IDs are time-ordered globally without coordination, which preserves sort-by-time properties in Cassandra' s clustering column without a separate created_at index.

TradeoffSnowflake IDs are not strictly monotonic across machines — two machines generating at the same millisecond produce IDs that interleave at the sequence level. For feed ordering, this is acceptable; tweets within the same millisecond can be ordered arbitrarily.

AlternativesUUID v7 (time-ordered)ULIDInstagram's 64-bit scheme

Fan-out worker autoscaling under celebrity bursts L5+ · Burst handling ›

Kafka consumer lag is the signal for fan-out worker autoscaling. When a celebrity tweets, the relevant Kafka partition's consumer lag spikes from near-zero to millions of events. An autoscaler (e.g. Kubernetes HPA driven by custom Kafka metrics via KEDA — a Kubernetes-native event-driven autoscaler) provisions additional fan-out worker pods within 30–60 seconds. The Kafka buffer absorbs the burst during this provisioning window, ensuring no events are dropped.

Freshness implicationDuring the 30–60 s provisioning window, fan-out lag may temporarily exceed the 5 s freshness NFR from §2. This is expected and acceptable for burst events. In steady state, lag is minimised. The 5 s NFR should be qualified as a steady-state target, not a burst-event guarantee.

Geo-replication and cross-region consistency L6+ · Multi-region ›

The system runs active-active across regions: users are routed to their nearest region by GeoDNS. Active-active is chosen over active-passive because the latency NFR (200 ms p99) cannot be met when a user in APAC is routed to US-East as their primary — the round-trip alone consumes half the budget. The tradeoff is cross-region consistency: tweet writes are committed locally first, then replicated to other regions via a cross-region Kafka topic (using Kafka MirrorMaker or a similar replication bridge). Cassandra's multi-datacenter replication handles tweet storage. Timeline caches are populated per-region — a user switching regions may see a slightly stale feed until the regional cache is populated from the regional tweet store.

TradeoffActive-active with eventual consistency means a tweet posted in APAC may not be visible in EU for 2–5 seconds. This is generally acceptable; the alternative (synchronous cross-region writes) would add 100–200 ms to every tweet's write latency.

ML-based feed ranking at scale L6/L7 · Ranking ›

Algorithmic ranking takes the candidate set from the timeline cache (~200 tweets) and scores each using a model that evaluates recency, author relationship strength, tweet engagement rate, and content type (video > images > text in engagement curves). The model runs as a low-latency inference service. Features are served from a feature store (e.g. Feast — an open-source ML feature store, or a Redis-backed custom implementation) that caches pre-computed user and tweet features to keep feature retrieval under 10 ms.

TradeoffML ranking adds ~20–50 ms to feed generation — acceptable within the 200 ms budget. The model objective function (engagement maximisation vs session satisfaction vs advertiser value) is a product decision that significantly shapes which tweets surface. An interviewer at L7/L8 will ask you to articulate this tradeoff explicitly.

ApproachesTwo-tower retrieval (user embedding ↔ tweet embedding)Logistic regression (fast)GNN (relationship-aware)

Redis cluster sharding for timeline at scale L5+ · Sharding ›

A single Redis instance cannot hold 200M user timelines. Redis Cluster shards data across N nodes using consistent hashing on the key (timeline: { user_id }). Each shard is replicated to 2–3 replica nodes for availability. At ~1.3 TB total timeline cache size (from §3), a 20-node cluster with 100 GB memory per node provides comfortable headroom with 1.5× space for overhead and replication.

Hot shard riskCelebrity user_ids that appear frequently in fan-out operations (as the tweeter, not the follower) do not create hot shards — fan-out writes target follower timelines, which are distributed across the cluster by follower user_id. The celebrity's own timeline key is rarely written.

Failure modes & edge cases

Scenario	Problem	Solution	Level
Timeline cache node failure	Followers' pre-computed timelines unavailable; all reads miss cache	Redis cluster failover to replicas; fall back to reading directly from Cassandra while cache rebuilds	L3/L4
Fan-out backlog after worker crash	Kafka consumer lag grows; followers don't see new tweets	Kafka's offset commit ensures at-least-once delivery; new workers resume from last committed offset	L3/L4
Duplicate tweet from retry	Client retries POST /tweet on network failure; tweet posted twice	Idempotency key on request; server deduplicates within 24-hour window using Redis SET NX	L5
User follows celebrity; backfill delay	New follower sees empty feed or misses celebrity's recent tweets	Background backfill job on follow event; injects last N=50 celebrity tweets into new follower's timeline cache	L5
Celebrity tweet during live event (50M+ followers)	Fan-out workers overwhelmed; freshness NFR breached for normal users	Celebrity accounts always use pull path; worker autoscaling handles smaller celeb accounts; rate-limit fan-out per second	L5
Cross-region timeline inconsistency	User sees different feeds depending on which region they hit	Consistent routing by user_id to primary region; regional cache eventually converges via Kafka replication	L5
Tweet deletion lag in timelines	Deleted tweet visible in follower feeds for seconds to minutes	Async deletion fan-out queued via same Kafka path; client-side soft-delete flag on API response for acknowledged deletes	L5
User unfollows an account	Unfollowed user's previously fanned-out tweet IDs remain in the follower' s timeline cache, showing stale content	Read-time filter: cross-check tweet authors against the current follow set on each feed fetch (preferred — no write amplification). Proactive purge via async Kafka event is possible but expensive for high-volume tweeters.	L5
Counter divergence (like counts)	Counter service lag causes like counts to differ across regions by >10 s	Bounded staleness accepted in NFR (§2); counter sync frequency tunable; display as approximate above 1, 000 (e.g. "1.2K likes") where rounding conceals the cross-region divergence	L7/L8
Ranking model degradation	ML ranker serves stale features or crashes; feed quality degrades silently	Circuit breaker on ranker service; fall back to chronological ordering on failure; shadow mode testing for model updates	L7/L8

How to answer by level

L3 / L4 Working system with push fan-out ›

What good looks like

Define functional requirements (post, read feed, follow)
Simple fan-out on write to a timeline table or cache
Two API endpoints with basic schemas
Identify that reads vastly outnumber writes (read-heavy system)
Mention Redis for timeline storage

Where L3/L4 typically falls short

Accepts the fan-out design without questioning write amplification for large followings
Skips capacity estimation — draws boxes without sizing the fan-out QPS or cache footprint
Stores full tweet objects in the timeline cache, coupling engagement updates to timeline invalidation

L5 Tradeoffs in fan-out strategy ›

What good looks like

Articulate push/pull/hybrid tradeoffs without prompting
Explain the follower threshold and how it's tuned
Discuss cursor pagination vs offset and why
Bring up idempotency on the write API
Note that engagement counts are eventually consistent and explain why

Where L5 typically falls short

Describes the hybrid threshold as a fixed number rather than a capacity-planning function
Discusses caching as a single layer rather than two independent caches with separate invalidation paths
Omits ranking entirely, or treats chronological order as the only option

L6 End-to-end ownership including ranking and ops ›

What good looks like

Full failure mode table without prompting
Ranking model architecture (two-tower, feature store, latency budget)
Geo-replication strategy and cross-region consistency tradeoffs
Observability — timeline cache hit rate: a drop below 95% signals fan-out lag or cold-start pressure
Observability — p99 feed hydration latency: a spike above 150 ms points to hydration cache misses
Observability — Kafka consumer lag on the fan-out topic: sustained lag above 10K events indicates worker capacity pressure
Graceful degradation: what happens if Redis is down?
Security & compliance: GDPR hard-delete propagation (delete must purge tweet from all timeline caches and the tweet store); async content moderation hook as a downstream Kafka consumer; API gateway rate limiting to prevent write-path abuse

Where L6 typically falls short

Frames ranking as a pure engineering choice without connecting it to the business objective it optimises for
Makes build-vs-buy decisions by intuition rather than reasoning through cost, operational burden, and vendor lock-in
Cannot estimate the system's cost profile — which component dominates the infrastructure bill and how it scales

L7 / L8 Architecture ownership and business framing ›

What good looks like

Frame the design around business objective first (engagement, DAU, revenue)
Challenge requirements — "do we need 5 s freshness or is 30 s acceptable to save 10× on infrastructure?"
Ranking objective function and its second-order effects (filter bubbles, viral amplification)
Cost model: what is the most expensive component and how does it scale?
Make strategic recommendations: "I would not build our own graph DB; use Neptune"

Distinguishing markers

Challenging assumptions rather than accepting them
Bringing in non-technical constraints (regulatory, vendor lock-in)
Estimating team structure and delivery timeline, not just system architecture

Classic interview probes

Question	L3/L4 answer	L5/L6 answer	L7/L8 answer
"How do you handle celebrities?"	Note that fan-out on write is expensive for large followings	Hybrid: push below threshold, pull-merge at read time; explain threshold selection	Threshold is a capacity planning function; discuss multi-tier celebrity classification (nano/micro/mega) with different handling per tier
"How does ranking work?"	Reverse chronological is simplest	ML ranker scores candidate set using engagement signals and relationship strength; feature store provides low-latency features	Objective function choice drives user behaviour; engagement maximisation creates filter bubbles; need to balance multiple objectives explicitly
"What happens if Redis goes down?"	Reads fail; feed is unavailable	Redis Cluster with replicas; circuit breaker to Cassandra fallback; graceful degradation to slower but functional feed	Layer failure budget into SLO; pre-warm Cassandra read path as standby; distinguish partial (single shard) from total failure
"How do you handle a user following 5,000 accounts?"	Pre-compute the merged feed at fan-out time and cache it — so the read path hits a single key regardless of how many accounts the user follows. The 200 ms latency target is only achievable because reads are O(1) lookups, not O(followees) merges.	Timeline cache stores pre-merged IDs; follow limits (Twitter: 5, 000 max) bound the worst case; merge at fan-out time not at read time	Power-law follow distributions mean median user follows ~150; p99 case drives cache sizing; propose progressive limit enforcement at 1K, 2K, 5K

How the pieces connect

Every architectural decision traces back to a requirement or a capacity number.

200 ms latency NFR (§2) → feed cannot be assembled at request time from raw data → pre-computed timelines in Redis (§8) become mandatory, not optional.

230K fan-out writes/sec steady state (§3) + power-law follower distribution → pure push model collapses for celebrity accounts → hybrid fan-out with follower threshold (§5).

Eventual consistency for like counts (§2 NFR) → counter updates do not need to block tweet reads → hydration cache with async counter refresh (§8) keeps engagement counts separate from timeline IDs, so a like never invalidates a follower's pre-computed timeline.

5 s freshness NFR — steady-state median (§2) → fan-out must complete within the budget for typical cases → Kafka with autoscaled consumers (§9) decouples write latency from fan-out throughput; burst events may temporarily exceed the target, which is why the NFR is qualified as a median, not a hard bound.

Timeline IDs separated from tweet objects (§7 data model) → engagement count updates don't invalidate timelines → two Redis caches (§8) with independent invalidation paths, keeping fan-out cache stable under high engagement.

99.999999% tweet durability NFR (§2) → tweet store cannot be an in-memory system → Cassandra with multi-DC replication (§4, §9) as the durable source of truth, with caches acting as acceleration layers, never as the record of truth.

Also in this series

Rate Limiter System Design — atomic Redis operations, distributed race conditions, and multi-tier quota enforcement
URL Shortener System Design — hash-based ID generation, Redis caching strategy, and async analytics pipeline
Web Crawler System Design — Bloom filter deduplication, politeness throttling, and distributed frontier design
Notification Service System Design — multi-channel delivery, idempotency keys, and priority queues at scale
Key-Value Store System Design — consistent hashing, eviction policies, replication, and failure modes
Search Autocomplete
Chat System (WhatsApp) System Design — WebSocket architecture, message delivery guarantees, and fan-out

Design a Twitter / X News Feed

What the interviewer is testing

Requirements clarification

Functional requirements

Non-functional requirements

Capacity estimation

Interactive capacity estimator

High-level architecture

Component breakdown

Architectural rationale

Real-world comparison

Fan-out strategy

API design

POST /tweets — Create a tweet

GET /feed — Home timeline

Core read flow

Data model

Access patterns

Schema

Caching strategy

Deep-dive: scalability

Failure modes & edge cases

How to answer by level

Classic interview probes

System Design Mock Interviews

Practice Coding Interviews Now