System Design Interview

Design a Twitter / X News Feed

Simple to describe, impossible to scale — the fan-out problem is the heart of every social feed interview.

L3 · Working SystemL5 · TradeoffsL7 · Architecture Ownership
~35 min read
A stick figure shouting into a megaphone at a massive crowd of followers, illustrating the fan-out problem
01

What the interviewer is testing

The news feed question is a proxy for one real question: do you understand write amplification?Any system that must broadcast one event to many recipients has to decide whether to do work at write time or at read time. The tradeoffs are not obvious, and they change depending on the data distribution — specifically, whether users follow power-law distributions (a handful of accounts with tens of millions of followers dominating the write path).

Three secondary signals matter here. A naive design — query the database on every feed load — simply does not work at this scale; the numbers make that obvious before any architecture is drawn. Cache consistency is non-trivial in a system with two independent write paths (fan-out writes timeline IDs, engagement writes update tweet objects); recognising that the two caches must be invalidated separately is a design smell test. And at L5+, ranking is the difference between a feed that feels alive and one that feels like a log dump — the system must be built to support re-scoring even if the initial design is chronological.

LevelThe core questionWhat gets you to the next level
L3/L4Can you design a working feed with basic fan-out?Identifying the celebrity write-amplification problem before the interviewer does
L5Can you articulate the push/pull tradeoff and the hybrid?Discussing ranking signals and the cost of algorithmic re-ranking at scale
L6Can you own the whole design including observability?Addressing cross-region replication, backfill on follow, and graceful degradation
L7/L8Should we build this, and what does the business optimise for?Framing the ranking objective function and its business consequences
02

Requirements clarification

Before drawing a single box, pin down scope. A news feed interview can go many directions — live sports scores, Instagram-style media, LinkedIn professional posts — and each changes the design materially.

Functional requirements

RequirementIn scopeOut of scope (today)
Post a tweet (text, image, video link)Live video streaming
Home timeline (aggregated feed of followees)Ads insertion / sponsored content
User timeline (profile page, own tweets)DMs, Spaces
Follow / unfollowLists, algorithmic discovery
Like, retweet, reply counts✓ (eventual)Full comment threading
Real-time new tweet notification✓ (best effort)Push notification delivery guarantees

Non-functional requirements

NFRTargetWhy this number
Feed load latency (p99)< 200 msUser perception threshold; beyond this, engagement drops measurably
Feed freshness< 5 s lag after post (steady-state median)Social context; seeing a 30-second-old tweet during a live event feels broken. Applies in steady state — celebrity bursts and autoscaling windows will occasionally exceed it.
Availability99.99%~52 min/year downtime; social feeds are not life-critical but user expectation is "always on" — which means Redis and API failover must complete within 30 s to stay within the annual budget.
Consistency (engagement counts)Eventual, ≤ 10 sLike counts are not transactional; approximate counts are acceptable and expected
Durability (tweets)99.999999% (8 nines)Losing a tweet is a trust-destroying user-visible event
Write scalability100 K tweets/s burstLive events (World Cup, elections) spike the write path by 50–100×
Why 200 ms for feed load?

200 ms is the threshold at which users consciously perceive a delay as "the app is loading." Below it, the feed feels instant. Above it, engagement starts dropping — Twitter's own research cited double-digit session-time reductions when median feed latency exceeded 300 ms.

What this drivesPre-computed timelines stored in an in-memory cache (§8). Assembling a feed by querying a database at request time — even with indexes — cannot reliably hit 200 ms p99 at scale without caching.
Why eventual consistency for like counts?

Like counts are social signals, not financial ledgers. If two users see 1, 402 vs 1, 403 likes simultaneously, neither is harmed. Enforcing strong consistency on every like would require distributed locking on every engagement event — an extreme cost for no real user benefit.

What this drivesCounter updates via a message queue (Kafka) and a counter aggregation service (§4). Counts are written to an eventually-consistent store and periodically flushed. This decouples engagement volume from tweet-read latency.
Why 5 s freshness?

Five seconds is a practical boundary between "real-time" and "delayed" in a social context. During live events, even 10–15 s lag can mean users see spoilers elsewhere first and lose trust in the platform as a live information source. This is a steady-state median target, not a worst-case bound — celebrity-tweet bursts and autoscaling provisioning windows (30–60 s) will occasionally exceed it, and that is expected and acceptable.

What this drivesFan-out via an asynchronous message queue with low-latency consumers (§5). Cache pre-computation must complete within the freshness budget for the typical case. This rules out lazy pull-only timelines as the sole strategy, but does not require synchronous fan-out before returning a tweet response.
03

Capacity estimation

Twitter-scale systems are write-amplified: one tweet may need to be placed into millions of follower timelines. The defaults below reflect a large-scale Twitter-like platform — 100M tweets per day, 200M active users — roughly comparable to Twitter's 2019–2020 scale. Adjust the sliders to match the scale you' re discussing in your interview.

Four dimensions matter here. Traffic dominates the cache sizing and fan-out throughput. Storage is driven by tweet retention and media references. The read:write ratio determines how heavily the cache must absorb reads relative to writes — on social platforms this is typically 30–100×. The critical output is fan-out QPS — the rate at which timeline entries are written when tweets arrive.

Interactive capacity estimator

100 M
200
30×
600 B
5 yrs
Write QPS
tweets/sec
Fan-out QPS
timeline writes/sec
Read QPS
feed reads/sec
Tweet Storage
TB (tweet rows)
Timeline Cache Size
TB (800 IDs/user, 8B/ID)
Fixed: 200M active users
Peak Write QPS
tweets/sec (20× surge)
⚠️

The fan-out number is the critical output.With 200 avg followers, 100M tweets/day yields~230K timeline writes per second in steady state. A single celebrity tweet to 50M followers pushes 50M writes in seconds — roughly the entire platform's steady-state fan-out in one event. This is why the hybrid fan-out strategy (§5) exists.

04

High-level architecture

Tweet StoreCassandraUser GraphFlockDB / Neo4jWrite APIPOST /tweetMsg QueueKafkaFan-outServiceRead APIGET /feedTimeline CacheRedisHydration CacheRedisNotificationGatewayWS / SSE / APNsClientWeb / iOSAndroidLoadBalancerWrite pathRead pathPush notifyAsync signal
High-level architecture. Write path: Client → LB → Write API → Tweet Store+Kafka → Fan-out → Timeline Cache. Fan-out reads the User Graph (up, same column) and signals the Notification Gateway (dotted). Read API hits the Timeline Cache (right) then the Hydration Cache below it. Gateway pushes real-time events back to the client (red).

Component breakdown

Load Balancerroutes traffic to write and read API clusters separately. This allows independent scaling: write capacity scales with tweet volume, read capacity with user sessions.

Write APIvalidates the tweet (content, media references, rate limits), writes durably to the tweet store, then emits a fan-out event to a durable message queue (Kafka). The response returns to the client after the durable write — not after fan-out completes. Media is uploaded separately via a pre-signed object store URL (e.g. Amazon S3) before the tweet is posted; the media_idsfield carries references to already-uploaded objects, served to end users through a globally distributed CDN.

Tweet Storeis a wide-column database (Apache Cassandra) keyed on user_id+tweet_id. This access pattern — "give me all tweets by user X" — maps naturally onto Cassandra's partition model. Cassandra' s multi-datacenter replication satisfies the 8-nine durability NFR from §2.

Message Queue (Kafka)buffers tweet events between the write API and fan-out consumers. It absorbs bursty celebrity-tweet spikes, ensures at-least-once delivery to fan-out workers, and provides a replay mechanism for consumer failures.

Fan-out Serviceconsumes from Kafka, looks up the tweeter's follower list in the user graph, and writes a tweet ID reference into each follower' s pre-computed timeline in the timeline cache. Normal users trigger push fan-out; celebrities (above a follower threshold) are skipped and handled at read time (§5).

User Graphstores follow relationships. Reads are extremely hot — every fan-out needs the full follower list for a user. A graph database (or a custom adjacency-list service like Twitter's FlockDB) is optimised for this traversal. The follower list for an active user is cached in memory.

Timeline Cache (Redis) stores pre-computed home timelines as sorted lists of tweet IDs per user. Each list holds ~800 recent tweet IDs (not full objects). Reads hit this cache first; a cache miss triggers a slower rebuild from the tweet store.

Hydration Cache (Redis) stores full tweet objects, keyed by tweet ID. When the read API fetches tweet IDs from the timeline cache, it hydrates them in a parallel batch from this cache, falling back to the tweet store. This separates "what is in your feed" from "what does each tweet contain," allowing engagement counts to update without invalidating timelines.

Notification Gateway handles real-time delivery to connected clients via a persistent connection protocol (WebSocket or Server-Sent Events). When the fan-out service writes to a follower's timeline cache, it also publishes to a per-user notification topic; the gateway pushes this signal to any active client connection. Mobile clients without an active connection receive a platform push notification — APNs for iOS, FCM for Android — which wakes the app to fetch the delta from the feed service.

Architectural rationale

Why Cassandra for the tweet store? Database choice

The tweet store has one dominant access pattern: range scans by user_id ordered by tweet_id (time). Cassandra's partition key + clustering column model serves this exactly. Writes are append-only (tweets are immutable once published), which plays to Cassandra' s LSM-tree write path. Multi-DC replication gives geographic durability without a separate replication layer.

TradeoffCassandra sacrifices strong consistency (eventual by default). Tweet reads are idempotent and can tolerate brief staleness. But operations that require strict ordering across partitions — such as checking a global trend — are awkward.
AlternativesMySQL (Twitter's original)DynamoDBGoogle Bigtable
Why separate tweet IDs from tweet objects in cache? Cache design

Storing full tweet objects in the timeline cache would mean every like or retweet invalidates the timeline entries for all followers — millions of cache writes per engagement. By separating "timeline = list of IDs" from "hydration cache = tweet objects," engagement updates only touch the hydration cache. Timelines remain stable and cheap to maintain.

TradeoffEvery feed read now requires two cache lookups: one for the timeline, one for object hydration. The hydration step can be parallelised but adds a small latency overhead (~10–20 ms in practice) compared to a single lookup.
Why Kafka between Write API and Fan-out? Durability

Without a queue, the write API would need to synchronously fan-out to potentially millions of followers before returning a response — making tweet latency proportional to follower count. Kafka decouples the tweet write (durable, fast) from fan-out (potentially slow, bursty). It also gives fan-out workers the ability to replay events after failures, and absorbs the spike from celebrity tweets without backpressure on the write path.

TradeoffAsynchronous fan-out means followers don't see the tweet instantly — there is a propagation delay. This is typically acceptable (and necessary) but must be acknowledged in §2' s freshness NFR discussion.
AlternativesRabbitMQAWS SQS+SNSGoogle Pub/Sub

Real-world comparison

DecisionThis designTwitter (historical)Facebook News Feed
Fan-out modelHybrid push/pullHybrid (fan-out on write + celebrity pull)Primarily fan-out on read (TAO graph)
Timeline storeRedis sorted setsRedis lists (Flock timeline)Ranked feed server (in-memory)
Tweet/post storeCassandraManhattan (custom key-value store)MySQL + Haystack (object storage for media)
Graph storeFlockDB-style (adjacency-list service)FlockDB (custom adjacency-list graph store)TAO (Facebook's graph database)
RankingChronological (L3) / ML ranker (L5+)Algorithmic (Twitter's recommendation engine)EdgeRank → deep ML model
ℹ️

Neither approach is universally correct. Facebook's read-time fan-out works because their post rate is lower per user and their ranking computation is more complex. Twitter' s write-time fan-out works because tweet consumption vastly outpaces tweet production — pre-computing makes reads cheap. The right model follows directly from the read:write ratio established in §3.

05

Fan-out strategy

Fan-out is the central algorithm of a news feed: when a tweet is published, which followers see it and when? There are two extremes and a hybrid that nearly every production social platform converges on.

① Fan-out on Write (Push model) Tweet arrives Push to all N follower caches ✓ Reads: O(1) cache hit ✗ Writes: O(N followers) ✗ Celebrity problem Best: normal users ② Fan-out on Read (Pull model) User opens feed Merge timelines of all followees ✓ Writes: O(1) ✓ Fresh (up to cache TTL) ✗ Reads: O(followees × depth) Best: celebrity accounts ③ Hybrid (Production default) Followers < threshold: → Push fan-out Followers ≥ threshold: → Pull at read time Merge celebrity tweets into feed on GET /feed Threshold: ~1M followers (tuned by write capacity)
Three fan-out strategies. Push (①) is fast to read; pull (②) is cheap to write; hybrid (③) caps worst-case write amplification by pulling celebrity tweets at read time.

Our choice for this system is the hybrid. The 200 ms latency NFR from §2 rules out pure pull (assembling feeds from 1, 000 followees at read time cannot reliably hit 200 ms p99). The fan-out write capacity NFR rules out pure push for accounts with millions of followers. The hybrid threads both constraints.

🎯

Classic follow-up: "What happens when a normal user follows a celebrity?" The follow event triggers a backfill: the system fetches the celebrity's recent N tweets and injects their IDs into the new follower' s timeline cache. This is a background operation, gated on a queue, and must be idempotent (safe to re-run if it fails midway).

Hybrid threshold selection — how to tune it

The threshold is not a fixed number — it is a function of fan-out worker cluster capacity. At 100K fan-out writes/second, a celebrity with 10M followers would take 100 seconds to propagate fully under pure push. Setting the threshold at 1M followers caps the worst-case single-event fan-out at ~10 seconds. That still exceeds the 5 s freshness NFR, which is why that target must be treated as a steady-state median rather than a hard bound.

Engineering realityIn production, a tiered approach gives finer control: push for accounts under 100K followers, throttled push for 100K–1M, and pure pull above 1M. The threshold is a runtime-configurable flag on the user graph entry, not a code change. A tiered model lets you tune freshness vs infrastructure cost per follower-count bracket.
05b

API design

Two core endpoints drive the system. Both carry implicit authentication via a JWT or session token validated at the API gateway layer before reaching these services.

POST /tweets — Create a tweet

 // Request
                POST /v1/tweets Authorization: Bearer < token>

                Content-Type: application/json {
                    "text" : "Hello world — 280 chars max" ,
                    "media_ids" : ["m_abc123" ],  // pre-uploaded media references
                    "reply_to_tweet_id" : null,  // null if top-level tweet
                    "idempotency_key" : "uuid-v4"   // client-generated; prevents duplicate posts
                }

                 // Response — 201 Created

                    {
                    "tweet_id" : "1843927401234567168" ,
                    "created_at" : "2026-03-29T10:00:00Z" ,
                    "author_id" : "u_42" ,
                    "fan_out_status" : "queued"   // fan-out is async; not yet delivered
                }

                

Validation rules: text must be 1–280 characters; media_ids must reference pre-uploaded objects (not inline uploads); idempotency_key deduplication window is 24 hours. Rate limiting is enforced at the API gateway: 300 tweets per 3-hour rolling window per user, counted via a sliding window counter in Redis.

GET /feed — Home timeline

 // Request
                GET /v1/feed?limit=20& cursor=eyJsYXN0X3R3ZWV0X2lkIjoiMTg0MDAwIn0 Authorization: Bearer < token>

                 // Response — 200 OK

                    {
                    "tweets" : [ {

                        "tweet_id" : "1843927401234567168" ,
                        "text" : "Hello world" ,
                        "author" : {
                            "user_id" : "u_42" , "handle" : "@alice" 
                        }

                        ,
                        "created_at" : "2026-03-29T10:00:00Z" ,
                        "engagement" : {
                            "likes" : 128, "retweets" : 34
                        }

                        ,
                        "media" : []
                    }

                     // ... up to limit items
                    ],
                    "next_cursor" : "eyJsYXN0X3R3ZWV0X2lkIjoiMTgzOTAwIn0" ,
                    "has_more" : true
                }

                

Cursor-based pagination (not offset) is mandatory here — offset pagination would require counting rows from the beginning of the timeline on every page, which is expensive on distributed caches. The cursor encodes the last seen tweet_id; the feed service continues from that point.

EndpointLevelNotes
GET /users/:id/tweetsL3/L4User's own timeline; served directly from tweet store
DELETE /tweets/:idL3/L4Soft-delete; triggers async cache invalidation across fan-out timelines
POST /tweets/:id/likeL5Idempotent; counter update via async event; returns optimistic count
GET /feed/delta?since=tweet_idL5Returns new tweets since last poll — used by clients for real-time updates
GET /feed?algorithm=rankedL7/L8Toggle between chronological and ML-ranked feed
06

Core read flow

The latency NFR from §2 (200 ms p99) is determined almost entirely by the read path — specifically by the two sequential cache lookups (timeline IDs, then tweet objects). The write path is comparatively relaxed; a 500 ms write is acceptable. The read path is where the architecture either succeeds or fails at scale.

GET /feed (client) Fetch timeline IDs from Redis Cache hit? (timeline IDs present) YES Celebrity merge: pull & interleave celebrity tweets Hydrate tweet objects (batch, parallel from Redis) NO Cache miss: rebuild from tweet store Query Cassandra (multi-followee fan-in) Backfill timeline cache (async, background) Sort (chrono or ML-ranked) Apply ranking if enabled Return paginated feed
Feed read flow: timeline IDs fetched from Redis, celebrity tweets merged on the fly (hybrid fan-out), tweet objects hydrated in parallel, sorted and returned.

The key tradeoff in this flow is the cache-miss path. A cold cache (user hasn't loaded their feed in days) requires querying the tweet store for each followee' s recent tweets, merging and sorting them, then populating the cache — potentially 200–500 ms. To hit 200 ms p99, the cache-miss path must rarely execute; at a target p99, a 2% miss rate means 1-in-50 requests lands on the slow path, which is tolerable if each miss takes under 400 ms. This is why timelines are preheated for active users in the background (§8) rather than rebuilt lazily on first request.

💡

Chronological vs ranked: this flow shows chronological ordering (sort by tweet_id / created_at). At L5+, you're expected to note that reverse-chronological is a special case of ranking — and that algorithmic ranking adds a re-scoring step after hydration, using a model that evaluates engagement signals, relationship strength, and content type. This is why tweet objects (not just IDs) must be hydrated before ranking can occur.

07

Data model

There are four core entities: User, Tweet, Follow, and Timeline. Each is accessed very differently, which ends up shaping the storage choice per entity.

Access patterns

OperationFrequencyQuery shape
Fetch user profileMedium (per feed hydration)Point lookup by user_id
Fetch tweets by userMedium (user timeline)Range scan: user_id, last N tweets
Fetch home timeline IDsVery high (every feed load)Point lookup by user_id → ordered list
Fetch tweet by IDExtremely high (hydration)Point lookup by tweet_id
Lookup followers by user_idHigh (fan-out on write)Range scan: tweeter_id → all follower_ids
Check follow relationshipMedium (authorization)Point lookup: (follower_id, followee_id)
Increment like/retweet countExtremely highCounter increment; not per-row

Two patterns stand out from this table. First, timeline ID lookups and tweet hydration dominate by volume — both are point lookups, which maps cleanly onto caches (Redis) and key-value stores. Second, follower scans are write-path-critical but read infrequently relative to timeline reads — this justifies a specialised graph store rather than a general-purpose database.

Schema

users user_id BIGINT PK handle VARCHAR(15) display_name VARCHAR(50) bio TEXT follower_countINT is_celebrity BOOL ★ created_at TIMESTAMP profile_img_urlTEXT tweets (Cassandra) user_id BIGINT PK tweet_id BIGINT CK↓ text TEXT media_ids LIST< TEXT> reply_to BIGINT NULL like_count → counters ★ retweet_count→ counters ★ created_at TIMESTAMP CK↓=clustering key, descending follows (graph) follower_id BIGINT PK followee_id BIGINT CK created_at TIMESTAMP idx: followee_id (reverse index for fan-out lookups) timeline cache (Redis) KEY timeline: { user_id } TYPE sorted set (ZADD) SCORE tweet_id (Snowflake → time-ordered) MAX 800 tweet IDs per user ★ ★ Legend is_celebrity — BOOL flag; fan-out workers skip push for flagged accounts → counters — like/retweet counts live in a separate tweet_counters table (tweet_id PK, COUNTER cols only — Cassandra hard constraint) CK↓ — clustering key DESC; newest tweets returned first Snowflake tweet_id is monotonically increasing → safe as Redis ZADD score for chronological order
Core data model. Four entities, three storage systems: users in a relational store, tweets in Cassandra (wide-column), follow edges in a graph store, home timelines in Redis sorted sets.
Why cap the timeline cache at 800 entries?

The timeline cache holds tweet IDs, not full objects — each ID is 8 bytes. 800 IDs × 8 bytes=6.4 KB per user. With 200M active users, 200M × 6.4 KB=~1.3 TB total — a manageable Redis cluster size. Beyond 800 entries, users requesting older tweets would trigger a cache-miss rebuild from Cassandra, which is acceptable since infinite scroll requests for old content are rare and can tolerate higher latency.

Tradeoff800 is an approximation calibrated for "most active users" content; a user following 5, 000 accounts may burn through 800 entries in minutes if all followees tweet simultaneously. The capacity estimator in §3 can be adjusted for different active-follow distributions.
Why are like/retweet counts in a separate table?

Cassandra enforces a hard constraint: a table with COUNTER columns cannot contain any non-counter columns (and vice versa). Placing like_count and retweet_count in the tweets table alongside text and media_ids would cause the CREATE TABLE to fail at runtime. The correct approach is a separate tweet_counters table keyed purely on tweet_id with COUNTER columns, or — as most production systems do — a dedicated counter service (Redis INCRBY or a purpose-built counter store) that decouples engagement writes from tweet reads entirely.

Production patternTwitter historically used a separate counter service backed by a persistent store (not Cassandra) for engagement counts, since counter reads on the hot tweet hydration path need sub-millisecond latency. Redis INCRBY on a per-tweet key satisfies this; counts are periodically checkpointed to durable storage for recovery after Redis restart.
AlternativesRedis INCRBYDynamoDB atomic counterDedicated counter shard (MySQL)
Why a reverse index on follows.followee_id?

Without a secondary index (or reverse mapping in the graph store), this query would require a full table scan. The reverse index (keyed by followee_id) makes this lookup O(1) for the index and O(followers) for the scan — the only unavoidable cost.

Implementation noteIn practice, Twitter's FlockDB stored both directions as separate edge lists: forward (who I follow) and reverse (who follows me). Both are sharded by user_id. Hot follower lists for celebrity accounts are cached in memory on graph service nodes to avoid repeated full scans.
08

Caching strategy

The 200 ms p99 latency NFR from §2 is only achievable because multiple cache layers eliminate database round trips on the hot path. Each cache layer has a specific role in the §4 architecture.

Client Cache ~5 min TTL CDN / Edge Static assets, avatars Timeline Cache Redis (tweet IDs) Hydration Cache Redis (tweet objects) Cassandra source of truth miss → miss → Browser/app Caches rendered feed cards. Avoids redundant fetches on re-open. TTL: 5 min Edge / CDN Profile avatars, media thumbnails. Not tweet content (dynamic). TTL: 24 hr Timeline Cache Pre-computed feed as sorted tweet IDs per user. Written by fan-out service. TTL: 7 days (LRU evict) Max: 800 entries Hydration Cache Full tweet objects keyed by tweet_id. Counts refreshed every ~5 s from counter service. TTL: 24 hr
Four-layer cache hierarchy. Each layer corresponds to its physical position in the §4 request path: client → CDN → Redis timeline (fan-out write target) → Redis hydration → Cassandra.
Cache layerStoresInvalidationMiss fallback
Client cacheRendered feed cardsTTL (5 min) + explicit pull-to-refreshFull API request
CDN / edgeStatic assets, avatarsTTL (24 hr)Origin fetch + re-cache
Timeline cache (Redis)Per-user ordered tweet ID listsFan-out service writes; delete on tweet removal; 7-day TTLRebuild from Cassandra (slow path)
Hydration cache (Redis)Full tweet objects by tweet_idCounter service pushes updates every ~5 s; 24-hr TTLDirect Cassandra read
⚠️

Timeline cache invalidation on tweet deletion: when a tweet is deleted, the fan-out service must remove its ID from all follower timeline caches. For a user with 10M followers, this is 10M Redis DEL operations — effectively another fan-out event. The deletion is queued and processed asynchronously, so deleted tweets may remain visible in some followers' feeds for seconds to minutes. This is the deliberate choice: async deletion bounds the worst-case propagation time by the same fan-out throughput as normal writes, rather than blocking the deletion response on a synchronous sweep.

09

Deep-dive: scalability

A single-region, single-instance version of this system breaks in at least five ways as load grows. The following topics each address one of those breakage modes — they are independent of each other and can be drilled into in any order.

CDN LB Cluster Write API N servers Read API N servers Kafka Cluster partitioned by user_id Fan-out Workers autoscaled Timeline Redis cluster, 3 replicas Hydration Redis cluster, 3 replicas User Graph sharded + cached Cassandra multi-DC, N nodes Analytics Flink / ClickHouse Entire stack replicated across 3 geographic regions (US-East, EU-West, APAC) via active-active with eventual consistency Users routed to nearest region; cross-region replication via Kafka MirrorMaker for tweet events
Production-scale architecture: every component clustered, sharded, and geo-replicated. Fan-out workers autoscale against Kafka consumer lag. Entire stack active-active across 3 regions.
Distributed tweet IDs — why not auto-increment? L5+ · ID generation

Auto-increment requires a central counter, which becomes a single point of failure and a write bottleneck at scale. Twitter-style systems use a distributed ID generator — Twitter's Snowflake generates 64-bit IDs encoding a timestamp (41 bits), datacenter ID (5 bits), machine ID (5 bits), and sequence (12 bits). IDs are time-ordered globally without coordination, which preserves sort-by-time properties in Cassandra' s clustering column without a separate created_at index.

TradeoffSnowflake IDs are not strictly monotonic across machines — two machines generating at the same millisecond produce IDs that interleave at the sequence level. For feed ordering, this is acceptable; tweets within the same millisecond can be ordered arbitrarily.
AlternativesUUID v7 (time-ordered)ULIDInstagram's 64-bit scheme
Fan-out worker autoscaling under celebrity bursts L5+ · Burst handling

Kafka consumer lag is the signal for fan-out worker autoscaling. When a celebrity tweets, the relevant Kafka partition's consumer lag spikes from near-zero to millions of events. An autoscaler (e.g. Kubernetes HPA driven by custom Kafka metrics via KEDA — a Kubernetes-native event-driven autoscaler) provisions additional fan-out worker pods within 30–60 seconds. The Kafka buffer absorbs the burst during this provisioning window, ensuring no events are dropped.

Freshness implicationDuring the 30–60 s provisioning window, fan-out lag may temporarily exceed the 5 s freshness NFR from §2. This is expected and acceptable for burst events. In steady state, lag is minimised. The 5 s NFR should be qualified as a steady-state target, not a burst-event guarantee.
Geo-replication and cross-region consistency L6+ · Multi-region

The system runs active-active across regions: users are routed to their nearest region by GeoDNS. Active-active is chosen over active-passive because the latency NFR (200 ms p99) cannot be met when a user in APAC is routed to US-East as their primary — the round-trip alone consumes half the budget. The tradeoff is cross-region consistency: tweet writes are committed locally first, then replicated to other regions via a cross-region Kafka topic (using Kafka MirrorMaker or a similar replication bridge). Cassandra's multi-datacenter replication handles tweet storage. Timeline caches are populated per-region — a user switching regions may see a slightly stale feed until the regional cache is populated from the regional tweet store.

TradeoffActive-active with eventual consistency means a tweet posted in APAC may not be visible in EU for 2–5 seconds. This is generally acceptable; the alternative (synchronous cross-region writes) would add 100–200 ms to every tweet's write latency.
ML-based feed ranking at scale L6/L7 · Ranking

Algorithmic ranking takes the candidate set from the timeline cache (~200 tweets) and scores each using a model that evaluates recency, author relationship strength, tweet engagement rate, and content type (video > images > text in engagement curves). The model runs as a low-latency inference service. Features are served from a feature store (e.g. Feast — an open-source ML feature store, or a Redis-backed custom implementation) that caches pre-computed user and tweet features to keep feature retrieval under 10 ms.

TradeoffML ranking adds ~20–50 ms to feed generation — acceptable within the 200 ms budget. The model objective function (engagement maximisation vs session satisfaction vs advertiser value) is a product decision that significantly shapes which tweets surface. An interviewer at L7/L8 will ask you to articulate this tradeoff explicitly.
ApproachesTwo-tower retrieval (user embedding ↔ tweet embedding)Logistic regression (fast)GNN (relationship-aware)
Redis cluster sharding for timeline at scale L5+ · Sharding

A single Redis instance cannot hold 200M user timelines. Redis Cluster shards data across N nodes using consistent hashing on the key (timeline: { user_id }). Each shard is replicated to 2–3 replica nodes for availability. At ~1.3 TB total timeline cache size (from §3), a 20-node cluster with 100 GB memory per node provides comfortable headroom with 1.5× space for overhead and replication.

Hot shard riskCelebrity user_ids that appear frequently in fan-out operations (as the tweeter, not the follower) do not create hot shards — fan-out writes target follower timelines, which are distributed across the cluster by follower user_id. The celebrity's own timeline key is rarely written.
10

Failure modes & edge cases

ScenarioProblemSolutionLevel
Timeline cache node failure Followers' pre-computed timelines unavailable; all reads miss cache Redis cluster failover to replicas; fall back to reading directly from Cassandra while cache rebuilds L3/L4
Fan-out backlog after worker crash Kafka consumer lag grows; followers don't see new tweets Kafka's offset commit ensures at-least-once delivery; new workers resume from last committed offset L3/L4
Duplicate tweet from retry Client retries POST /tweet on network failure; tweet posted twice Idempotency key on request; server deduplicates within 24-hour window using Redis SET NX L5
User follows celebrity; backfill delay New follower sees empty feed or misses celebrity's recent tweets Background backfill job on follow event; injects last N=50 celebrity tweets into new follower's timeline cache L5
Celebrity tweet during live event (50M+ followers) Fan-out workers overwhelmed; freshness NFR breached for normal users Celebrity accounts always use pull path; worker autoscaling handles smaller celeb accounts; rate-limit fan-out per second L5
Cross-region timeline inconsistency User sees different feeds depending on which region they hit Consistent routing by user_id to primary region; regional cache eventually converges via Kafka replication L5
Tweet deletion lag in timelines Deleted tweet visible in follower feeds for seconds to minutes Async deletion fan-out queued via same Kafka path; client-side soft-delete flag on API response for acknowledged deletes L5
User unfollows an account Unfollowed user's previously fanned-out tweet IDs remain in the follower' s timeline cache, showing stale content Read-time filter: cross-check tweet authors against the current follow set on each feed fetch (preferred — no write amplification). Proactive purge via async Kafka event is possible but expensive for high-volume tweeters. L5
Counter divergence (like counts) Counter service lag causes like counts to differ across regions by >10 s Bounded staleness accepted in NFR (§2); counter sync frequency tunable; display as approximate above 1, 000 (e.g. "1.2K likes") where rounding conceals the cross-region divergence L7/L8
Ranking model degradation ML ranker serves stale features or crashes; feed quality degrades silently Circuit breaker on ranker service; fall back to chronological ordering on failure; shadow mode testing for model updates L7/L8
11

How to answer by level

L3 / L4 Working system with push fan-out
What good looks like
  • Define functional requirements (post, read feed, follow)
  • Simple fan-out on write to a timeline table or cache
  • Two API endpoints with basic schemas
  • Identify that reads vastly outnumber writes (read-heavy system)
  • Mention Redis for timeline storage
Where L3/L4 typically falls short
  • Accepts the fan-out design without questioning write amplification for large followings
  • Skips capacity estimation — draws boxes without sizing the fan-out QPS or cache footprint
  • Stores full tweet objects in the timeline cache, coupling engagement updates to timeline invalidation
L5 Tradeoffs in fan-out strategy
What good looks like
  • Articulate push/pull/hybrid tradeoffs without prompting
  • Explain the follower threshold and how it's tuned
  • Discuss cursor pagination vs offset and why
  • Bring up idempotency on the write API
  • Note that engagement counts are eventually consistent and explain why
Where L5 typically falls short
  • Describes the hybrid threshold as a fixed number rather than a capacity-planning function
  • Discusses caching as a single layer rather than two independent caches with separate invalidation paths
  • Omits ranking entirely, or treats chronological order as the only option
L6 End-to-end ownership including ranking and ops
What good looks like
  • Full failure mode table without prompting
  • Ranking model architecture (two-tower, feature store, latency budget)
  • Geo-replication strategy and cross-region consistency tradeoffs
  • Observability — timeline cache hit rate: a drop below 95% signals fan-out lag or cold-start pressure
  • Observability — p99 feed hydration latency: a spike above 150 ms points to hydration cache misses
  • Observability — Kafka consumer lag on the fan-out topic: sustained lag above 10K events indicates worker capacity pressure
  • Graceful degradation: what happens if Redis is down?
  • Security & compliance: GDPR hard-delete propagation (delete must purge tweet from all timeline caches and the tweet store); async content moderation hook as a downstream Kafka consumer; API gateway rate limiting to prevent write-path abuse
Where L6 typically falls short
  • Frames ranking as a pure engineering choice without connecting it to the business objective it optimises for
  • Makes build-vs-buy decisions by intuition rather than reasoning through cost, operational burden, and vendor lock-in
  • Cannot estimate the system's cost profile — which component dominates the infrastructure bill and how it scales
L7 / L8 Architecture ownership and business framing
What good looks like
  • Frame the design around business objective first (engagement, DAU, revenue)
  • Challenge requirements — "do we need 5 s freshness or is 30 s acceptable to save 10× on infrastructure?"
  • Ranking objective function and its second-order effects (filter bubbles, viral amplification)
  • Cost model: what is the most expensive component and how does it scale?
  • Make strategic recommendations: "I would not build our own graph DB; use Neptune"
Distinguishing markers
  • Challenging assumptions rather than accepting them
  • Bringing in non-technical constraints (regulatory, vendor lock-in)
  • Estimating team structure and delivery timeline, not just system architecture

Classic interview probes

QuestionL3/L4 answerL5/L6 answerL7/L8 answer
"How do you handle celebrities?" Note that fan-out on write is expensive for large followings Hybrid: push below threshold, pull-merge at read time; explain threshold selection Threshold is a capacity planning function; discuss multi-tier celebrity classification (nano/micro/mega) with different handling per tier
"How does ranking work?" Reverse chronological is simplest ML ranker scores candidate set using engagement signals and relationship strength; feature store provides low-latency features Objective function choice drives user behaviour; engagement maximisation creates filter bubbles; need to balance multiple objectives explicitly
"What happens if Redis goes down?" Reads fail; feed is unavailable Redis Cluster with replicas; circuit breaker to Cassandra fallback; graceful degradation to slower but functional feed Layer failure budget into SLO; pre-warm Cassandra read path as standby; distinguish partial (single shard) from total failure
"How do you handle a user following 5,000 accounts?" Pre-compute the merged feed at fan-out time and cache it — so the read path hits a single key regardless of how many accounts the user follows. The 200 ms latency target is only achievable because reads are O(1) lookups, not O(followees) merges. Timeline cache stores pre-merged IDs; follow limits (Twitter: 5, 000 max) bound the worst case; merge at fan-out time not at read time Power-law follow distributions mean median user follows ~150; p99 case drives cache sizing; propose progressive limit enforcement at 1K, 2K, 5K
How the pieces connect
Every architectural decision traces back to a requirement or a capacity number.
01
200 ms latency NFR (§2) → feed cannot be assembled at request time from raw data → pre-computed timelines in Redis (§8) become mandatory, not optional.
02
230K fan-out writes/sec steady state (§3) + power-law follower distribution → pure push model collapses for celebrity accounts → hybrid fan-out with follower threshold (§5).
03
Eventual consistency for like counts (§2 NFR) → counter updates do not need to block tweet reads → hydration cache with async counter refresh (§8) keeps engagement counts separate from timeline IDs, so a like never invalidates a follower's pre-computed timeline.
04
5 s freshness NFR — steady-state median (§2) → fan-out must complete within the budget for typical cases → Kafka with autoscaled consumers (§9) decouples write latency from fan-out throughput; burst events may temporarily exceed the target, which is why the NFR is qualified as a median, not a hard bound.
05
Timeline IDs separated from tweet objects (§7 data model) → engagement count updates don't invalidate timelines → two Redis caches (§8) with independent invalidation paths, keeping fan-out cache stable under high engagement.
06
99.999999% tweet durability NFR (§2) → tweet store cannot be an in-memory system → Cassandra with multi-DC replication (§4, §9) as the durable source of truth, with caches acting as acceleration layers, never as the record of truth.
Also in this series

System Design Mock Interviews

AI-powered system design practice with real-time feedback on your architecture and tradeoff reasoning.

Coming Soon

Practice Coding Interviews Now

Get instant feedback on your approach, communication, and code — powered by AI.

Start a Coding Mock Interview →