YouTube System Design Interview Guide
Simple to describe, hard to scale, serving billions of minutes of video per day requires rethinking every layer of the stack.
What the interviewer is testing
Video streaming comes up in FAANG interviews because it's genuinely hard in multiple directions at once: you're dealing with large binary assets, async processing pipelines, global delivery, and a client-side protocol most engineers have never had to think about. The "store files and serve them" framing candidates arrive with gets demolished pretty quickly.
Where level shows up most clearly: junior candidates know video needs different quality options; senior candidates understand why the file isn't served as a single blob and what ABR actually means in practice; staff and above think about the economics, CDN egress is a different cost category from database reads, and the architecture reflects that.
| Level | What good looks like |
|---|---|
| L3/L4 | Can identify the main components (upload, storage, streaming), separate upload from viewing, and mention different video qualities. |
| L5 | Understands adaptive bitrate, chunked upload resumability, CDN cache hierarchy, and async transcoding pipeline. Can reason about tradeoffs. |
| L6 | Owns the encoding pipeline end-to-end, designs the metadata service and counter architecture, and addresses failure modes in the transcoding path. |
| L7/L8 | Thinks about cost optimization (CDN egress, storage tiers), multi-region replication strategy, live vs. on-demand tradeoffs, and platform-level decisions around codec selection (AV1 vs. H.264). |
Requirements clarification
Before drawing anything, separate the two subsystems: the upload and transcoding path (write-heavy, async, latency-tolerant) and the playback path (read-heavy, synchronous, latency-critical). Most design mistakes come from conflating them.
Functional requirements
| Requirement | Notes |
|---|---|
| Upload video | Support large files (up to ~10 GB), resumable if interrupted |
| Transcode to multiple qualities | 360p, 480p, 720p, 1080p, 4K, async after upload |
| Stream video adaptively | Client selects quality dynamically based on bandwidth |
| Video metadata | Title, description, tags, uploader, duration, thumbnail |
| Search & browse | Search by title/tag, home feed, channel view, out of scope for this design |
| View count & engagement | Like count, views, eventually consistent is acceptable |
Non-functional requirements
| Requirement | Target |
|---|---|
| Playback start latency | < 2s to first frame on good network |
| Rebuffering rate | < 1% of sessions experience a stall |
| Upload availability | 99.9%, a failed upload is recoverable by retrying |
| Streaming availability | 99.99%, a playback interruption is a direct user experience failure |
| Transcoding latency | Processing begins within 60s of upload completion; standard video available within 5 min |
| Storage durability | 11 nines (same as S3 standard), data loss is unacceptable |
| View count accuracy | Eventual consistency acceptable; counts may lag by 60s |
Playback latency < 2sNFR reasoning›
Two seconds to first frame is the boundary where users begin to perceive "slowness." Studies on streaming abandonment show a sharp cliff at 3s: roughly 10% of viewers leave for every additional second of startup delay. The 2s target drives the requirement for CDN edge caching (§8), without it, every playback request would traverse the full network path to origin, adding 100–500ms of latency globally.
View count eventual consistencyNFR reasoning›
View counts are inherently a high-write, low-precision metric. At 500M daily active users, if only 10% of sessions trigger a view count increment, that's ~580K writes/second during peak. Counting with strong consistency would require all those writes to hit a single row, creating a hot-spot. A 60-second lag is imperceptible to users and has no product-level consequence. This drives the Redis counter buffer + periodic flush design in §8.
Streaming availability 99.99% vs upload 99.9%NFR reasoning›
These are intentionally asymmetric. An upload failure is a recoverable event, the uploader gets an error and retries. A playback failure during an active session is an irreversible experience failure: the user has already started watching. 99.99% means <52 minutes of downtime per year. 99.9% allows ~8.7 hours. This asymmetry justifies different infrastructure costs for the two paths, the streaming path warrants multi-region failover; the upload path does not.
Capacity estimation
Video systems are unusual in that storage and bandwidth dominate costs rather than compute or QPS. A single 1080p minute of video compresses to roughly 150 MB at H.264. Encoding all 5 quality tiers (360p through 4K) multiplies that by ~5.9×, 4K alone is roughly 4× the size of 1080p, and the lower tiers (720p, 480p, 360p) add another ~0.9× combined. CDN egress, not storage, is usually the biggest line item: a popular video served to 1M viewers costs more to deliver than to store.
Interactive Capacity Estimator
CDN egress bandwidth estimate loading…
High-level architecture
A video streaming platform separates two fundamentally different subsystems: the upload and transcoding path (write-heavy, asynchronous, latency-tolerant) and the playback path (read-heavy, synchronous, latency-critical). Producers upload raw video files; these are asynchronously encoded into multiple quality variants and distributed to CDN edge nodes. Viewers stream encoded segments directly from the nearest CDN edge, with the player selecting quality adaptively based on available bandwidth.
Upload path (top): chunked upload → object store → async transcoding. CDN tier (middle): encoded segments flow down to CDN Origin; CDN Edge caches and serves. Playback path (bottom): viewer fetches manifest via API Gateway, then streams segments directly from CDN Edge.
Upload API orchestrates uploads without ever touching the video bytes. It issues
pre-signed S3 multipart upload URLs to the client; the client uploads each chunk directly to S3,
bypassing the API server entirely. Once the client calls /upload/complete with the
per-chunk ETags, the API calls S3's CompleteMultipartUpload and publishes a transcoding job to the
queue. This keeps the Upload API stateless and prevents it from becoming a bandwidth bottleneck, S3
absorbs all the byte transfer load.
Object Store holds two kinds of data: raw uploads awaiting transcoding, and the encoded segment files that are the actual playback artefacts. These are structurally separate prefixes within the same store. The raw bucket is ephemeral (files may be lifecycle-deleted after 30 days); the segments bucket is permanent and must be globally durable.
Message Queue (a durable log like Kafka) decouples the upload API from the transcoding workers. Without this, upload latency becomes coupled to transcoding capacity, a transcoding backlog would block new uploads. The queue also provides retry semantics: if a transcoding job fails, the message is not acknowledged and gets re-delivered.
Transcoding Workers consume jobs from the queue, pull raw video from the object store, run FFmpeg (or a custom encoder), and write the output segments back to the object store. CPU-bound and embarrassingly parallel, during peak upload hours you scale out the fleet; overnight you scale it back. Workers are stateless, so there's no coordination overhead.
Metadata DB stores video metadata (title, description, uploader, duration, manifest URL,
processing status). This is a relational workload, a SQL database (PostgreSQL) is appropriate. The
metadata record is created at upload time with status processing and updated to
ready once transcoding completes.
CDN Edge (a content delivery network like Cloudflare or Akamai) serves video segments from points of presence near the viewer. The CDN pulls segments from the object store on first request and caches them at the edge. For popular videos, 95%+ of requests are served entirely from edge cache without touching origin.
API Gateway handles the viewer's two non-video requests: fetching video metadata and fetching the manifest file that describes available qualities and segment URLs. These are tiny JSON payloads, the gateway routes to the metadata service, which hits the database (or cache) and returns within milliseconds.
Architectural rationale
Separate upload and playback API surfacesisolation›
Upload is write-heavy, latency-tolerant, and intermittently bursty (viral event → sudden upload spike). Playback is read-heavy, latency-critical, and needs 99.99% availability. Running them as separate services means a transcoding backlog or upload service incident cannot degrade streaming. This directly addresses the asymmetric NFRs from §2.
Async transcoding via message queuedecoupling›
Transcoding a 1-hour video takes minutes even on fast hardware, far too slow to do synchronously in an upload request. The queue (Kafka) acts as a buffer: upload completes instantly, transcoding happens when workers are available. The queue also provides exactly-once delivery semantics using consumer group offsets, preventing duplicate processing.
CDN-first delivery, segments never served from origincost + latency›
Video segments are immutable once encoded, a given segment URL will always return the same bytes. This is ideal for CDN caching: a segment cached at an edge node serves unlimited viewers without touching origin. Without CDN, a single viral video could saturate an entire datacenter's egress. The CDN doesn't just improve latency, it makes the unit economics of video streaming viable.
Real-world comparison
| Decision | This design | YouTube | Netflix |
|---|---|---|---|
| Streaming protocol | HLS or MPEG-DASH | MPEG-DASH (custom) | MPEG-DASH |
| Encoding | FFmpeg workers on Kafka jobs | Spanner-coordinated transcoding | Cosmos (distributed encoding) |
| CDN delivery | Third-party CDN (Cloudflare/Akamai) | Google's own CDN infrastructure | Open Connect (ISP-embedded) |
| Primary codec | H.264 (widest compat) | VP9 / AV1 | H.264 / H.265 / AV1 |
| Metadata store | PostgreSQL | Bigtable + Spanner | Cassandra + MySQL |
Every major platform eventually builds custom infrastructure around CDN delivery because egress is the dominant cost. The decisions cascade from codec (AV1 compresses ~30% better than H.264, which translates directly to egress savings at scale) through to delivery architecture. YouTube and Netflix chose differently from each other, and both chose differently from this design, because their scale, device targets, and cost structures differ.
Adaptive bitrate streaming
Adaptive bitrate (ABR) streaming is a video delivery technique where the source video is cut into short segments (typically 4–6 seconds) and encoded at multiple quality levels — for example 360p, 720p, 1080p, and 4K. A manifest file (HLS .m3u8 or DASH .mpd) lists all available quality variants and their segment URLs. The client player measures current network bandwidth and buffer fill, then requests each segment at the highest quality it can sustain without rebuffering. The server’s job is to make all versions available; the adaptation logic lives entirely in the player.
Most distributed systems serve one canonical response to a request. Video streaming serves dozens of versions of the same content, the client picks which one to fetch next, segment by segment, based on what its network can handle right now. The server's job is to make all the versions available; the adaptation logic lives entirely in the player.
The mechanism: cut the video into short segments (4–6 seconds each), encode each segment at multiple quality levels, and publish a manifest listing all the options. The client downloads the next segment at whatever quality its measured throughput can sustain. Because it's only committing 4–6 seconds ahead, it can step down quality quickly when the network degrades, without making the viewer wait for a full rebuffer.
A single source video produces 5 quality variants. The manifest lists all variants; the ABR client picks which to request, segment by segment, based on measured bandwidth and buffer state.
The two standard protocols: HLS and MPEG-DASH
| HLS (HTTP Live Streaming) | MPEG-DASH | |
|---|---|---|
| Origin | Apple (2009) | MPEG consortium (2012) |
| Manifest format | .m3u8 (plain text) |
.mpd (XML) |
| Segment format | MPEG-TS or fMP4 | fMP4 (fragmented) |
| Native device support | All Apple; most modern browsers | Android, Smart TVs, browsers (via JS library) |
| Codec flexibility | Limited, H.264/H.265 primary | Any codec (AV1, VP9, H.265) |
| Royalties | Apple patents | Royalty-free |
Our choice for this design is HLS with fMP4 segments. HLS offers the broadest device compatibility (including all Apple devices) and fMP4 segments are compatible with most DASH players, giving us the option to serve both protocols from the same segment files. For interview purposes, either choice is fine, the important thing is explaining the tradeoff.
ABR algorithm internalsadvanced›
The ABR algorithm runs on the client and makes a quality decision before fetching each segment. The simplest algorithm (throughput-based) measures the download speed of the last N segments and picks the highest bitrate that fits within 80% of that speed. More sophisticated algorithms (BOLA, MPC) consider buffer occupancy: if the buffer is draining fast, step down quality; if the buffer is full, it's safe to step up. Netflix's Pensieve (2017) used reinforcement learning to train per-network-condition ABR policies, an example of where L7-level thinking applies.
Common follow-up: "What segment length should we use?" Shorter segments (2s) give faster adaptation but more HTTP overhead. Longer segments (10s) are more efficient but adapt slowly to network changes. Most platforms converge on 4–6 seconds as the sweet spot. Segment length is fixed at encoding time and encoded into the manifest.
API design
POST /videos/upload/init, Initiate chunked upload
// Request
{
"filename": "vacation.mp4",
"file_size_bytes": 2147483648, // 2 GB
"content_type": "video/mp4",
"title": "Summer Vacation 2025",
"description": "..."
}
// Response 201 Created
{
"upload_id": "upl_8f3k2j9...",
"video_id": "vid_4x2k9...",
"chunk_urls": [ // pre-signed S3 multipart URLs
{ "part_number": 1, "url": "https://s3.../..." },
// ...
],
"chunk_size_bytes": 5242880 // 5 MB chunks
}
The upload API returns pre-signed S3 multipart upload URLs. The client uploads each chunk directly to S3 , the upload API server never touches the video bytes, only the metadata. This prevents the API server from becoming a bandwidth bottleneck and offloads the heavy lifting to S3's durable storage layer.
Auth and abuse prevention (don't skip this in an interview): The upload init
endpoint must require authentication (OAuth token or API key), unauthenticated upload would allow
anyone to fill your object store. Rate-limit per creator (e.g. 10 uploads/hour) to prevent abuse.
Validate content_type server-side before issuing pre-signed URLs, only issue URLs for
video MIME types. File type validation beyond MIME type (magic bytes check) happens after the upload
lands in the raw bucket, before the transcoding job is enqueued.
POST /videos/upload/complete, Finalize and trigger transcoding
// Request
{
"upload_id": "upl_8f3k2j9...",
"parts": [
{ "part_number": 1, "etag": "\"abc123\"" },
// ...
]
}
// Response 202 Accepted
{
"video_id": "vid_4x2k9...",
"status": "processing",
"estimated_ready_at": "2025-06-01T12:05:00Z"
}
GET /videos/:id, Fetch video metadata and manifest
// Response 200 OK
{
"video_id": "vid_4x2k9...",
"title": "Summer Vacation 2025",
"uploader": { "id": "usr_...", "name": "Alice" },
"duration_seconds": 342,
"view_count": 18420,
"status": "ready", // or "processing" | "failed"
"manifest_url": "https://cdn.example.com/videos/vid_4x2k9/master.m3u8",
"thumbnail_url": "https://cdn.example.com/thumbs/vid_4x2k9.jpg"
}
| Endpoint | Level | Notes |
|---|---|---|
| DELETE /videos/:id | L5 | Soft delete, mark metadata as deleted, CDN cache TTL expires naturally. Hard delete from object store is async. |
| GET /videos/:id/captions | L5 | Returns available caption tracks (WebVTT format). Auto-generated via async ASR pipeline. |
| POST /videos/:id/views | L5 | Increments view counter. Idempotent per session via session token dedupe in Redis. |
| GET /videos/:id/analytics | L7/L8 | Per-segment watch-time heatmap. Requires separate analytics pipeline, a columnar store (ClickHouse or BigQuery). |
Thumbnail generation: Thumbnails are extracted as a lightweight step inside the
transcoding pipeline. After the first pass of encoding completes, FFmpeg extracts a frame at the 10%
mark of the video duration and stores it to the object store (e.g.
thumbs/{video_id}.jpg). The metadata record's thumbnail_url is populated
at the same time the manifest_url is set, so both become available atomically when
status transitions to ready.
Core flows, upload & playback
Upload flow
The latency NFR for uploads is relaxed, users expect processing to take minutes, not milliseconds. The primary requirements are reliability (no data loss) and resumability (a dropped connection should not require re-uploading 2 GB). These requirements together drive the multipart upload pattern, where each chunk is independently acknowledged before the next is sent.
Client uploads chunks directly to S3, bypassing the Upload API for byte transfer. API only handles metadata and orchestration. Transcoding is triggered asynchronously.
Playback flow
The playback latency NFR (<2s to first frame, from §2) drives the key design decision here: the manifest and first segment must be available from CDN edge before the player starts buffering. The manifest file is tiny (~1 KB) but is fetched on every play, it goes through the API gateway, which checks metadata DB for the manifest URL and returns it. The segments themselves are fetched directly by the player from the CDN edge URL embedded in the manifest.
Why this matters for latency: The manifest round-trip through the API gateway adds ~50–100ms latency (fast, cached). If the first CDN edge segment is already cached, the player can start rendering within 200–400ms of playback start. If it needs to fetch from origin (cold cache), add another 100–500ms depending on geography. The CDN cache hit rate for popular content is typically >95%, making cold-cache paths rare in practice.
Data model
The video platform has three distinct data domains with very different access patterns. The access patterns are worth looking at first, they're what determine the schema, not the other way around.
| Operation | Frequency | Query shape |
|---|---|---|
| Fetch video metadata by ID | Very high (every playback) | Point lookup by video_id |
| List videos by uploader | High (channel page) | Range scan by uploader_id |
| Increment view count | Very high (every play event) | Counter increment by video_id |
| Update processing status | Low (once per transcoding job) | Point update by video_id |
| Fetch user profile | High | Point lookup by user_id |
Two things stand out from this table. First, the dominant read pattern is a point lookup by
video_id, this is extremely cache-friendly and drives the Redis metadata cache in §8.
Second, view count increments are so frequent that they cannot go directly to the primary database
without creating a hot-spot on popular video rows; they need their own buffered write path.
Schema
-- Core video record
CREATE TABLE videos (
video_id VARCHAR(16) PRIMARY KEY,
uploader_id VARCHAR(16) NOT NULL REFERENCES users(user_id),
title TEXT NOT NULL,
description TEXT,
duration_sec INTEGER,
status VARCHAR(16) NOT NULL DEFAULT 'processing',
-- 'processing' | 'ready' | 'failed'
manifest_url TEXT, -- null until transcoding completes
thumbnail_url TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- View counts live separately, write-heavy, eventually consistent
-- Managed by counter service; flushed here periodically from Redis
CREATE TABLE video_stats (
video_id VARCHAR(16) PRIMARY KEY REFERENCES videos(video_id),
view_count BIGINT NOT NULL DEFAULT 0,
like_count BIGINT NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Indexes
CREATE INDEX idx_videos_uploader ON videos(uploader_id, created_at DESC);
CREATE INDEX idx_videos_status ON videos(status) WHERE status = 'processing';
video_stats as a separate tablehot-spot prevention›
Popular videos can receive thousands of view events per second. If view_count lived as a
column in the videos table, every increment would lock that row, creating severe write
contention. Separating it into video_stats doesn't solve the hot-row problem
alone, but combined with the Redis buffer pattern in §8, it means the database only
receives periodic bulk flushes rather than per-event increments. The updated_at
column lets us confirm flush recency.
status field on the videos tablepolling simplicity›
After upload, the client can poll GET /videos/:id to check when
status transitions from processing to ready. This is
simpler than setting up a websocket or webhook for most upload clients. The partial index on
(status) WHERE status = 'processing' makes it efficient to query all in-flight
jobs from the monitoring dashboard without a full table scan.
Caching strategy
Video streaming uses three distinct caching layers: a CDN edge cache (stores immutable video segments close to viewers, TTL 7 days), a Redis metadata cache (stores serialized video JSON per video_id, handles ~1M QPS, TTL 10 minutes), and a Redis counter buffer (accumulates view and like increments in memory, flushed to PostgreSQL every 60 seconds). The CDN layer is what makes the economics of video streaming viable — at peak, 95%+ of segment requests are served from edge cache without touching origin.
A video uploaded once might be watched ten million times. That read/write ratio shapes every caching decision, and at this scale, caching isn't optional, it's what makes the cost model work at all. There are three distinct layers, each addressing a different part of the request path.
Three caching layers: CDN edge for video bytes, Redis for metadata JSON, Redis for write-buffered view counters. Each layer is positioned in the §4 request path.
| Layer | What it caches | Why it exists | Invalidation |
|---|---|---|---|
| ① CDN Edge | Video segments (.ts / .m4s), thumbnails, manifests | Latency NFR from §2: <2s to first frame requires edge proximity. Also the only way to serve massive concurrent viewers economically. | Segments are content-addressed and immutable, TTL of 7 days, no explicit invalidation needed. Manifests TTL 5 min (status may change). |
| ② Redis Metadata | Serialised video metadata JSON per video_id |
Every playback triggers a metadata fetch. DB can handle ~10K QPS; Redis handles ~1M QPS , essential at scale. | Explicit invalidation on any write: title edits, description changes, and, most critically
, when status transitions from processing to ready
(this is when manifest_url becomes non-null and must be visible to viewers
immediately). TTL of 10 min as safety net. Write-through on update. |
| ③ Redis Counter | In-memory atomic counters: views:{video_id}, likes:{video_id} |
View counts are write-heavy, direct DB increments at scale cause hot-row lock contention.
Redis INCR is O(1) and lock-free. |
Not a cache, it's the write buffer. A background job periodically flushes deltas to
PostgreSQL using UPDATE video_stats SET view_count = view_count + $delta. |
Counter flush strategy and accuracyL5+›
The flush job runs every 60 seconds per the eventual consistency NFR from §2. It captures each
counter's current value and resets it to 0 atomically, then writes the delta to PostgreSQL.
GETSET was deprecated in Redis 6.2 and removed in 7.0, the correct approach is a
Lua script, which Redis executes atomically:
-- Atomically capture and reset a counter.
-- Returns the previous value as a string, or nil if the key didn't exist
-- (i.e. no views since the last flush). Caller must handle nil.
local val = redis.call('GET', KEYS[1])
if val then redis.call('SET', KEYS[1], '0') end
return val -- nil means delta = 0; skip the DB write
The nil return matters: if a video received no views since the last flush, the key won't exist and the script returns nil. A flush job that doesn't handle nil will throw a null-pointer exception for every cold video in the counter keyspace. The fix is simple , skip the DB write when val is nil. A Redis failure mid-flush could still cause a double-count on the next flush (counter reset but DB write not committed); the eventual consistency NFR explicitly permits this, and the magnitude is bounded by 60 seconds of view events.
Deep-dive scalability
The following scalability problems don't all arise at the same time. A service at 1M daily users looks different from one at 100M. These are the problems that emerge as you scale, roughly in the order they bite you.
Transcoding pipeline parallelismL5+›
A naive transcoding worker processes a video sequentially, segment by segment, quality by quality. YouTube processes roughly 500 hours of video per minute, note the estimator defaults to 500 hours per day, about 0.6% of YouTube's actual ingest volume, intentionally scoped to a large-but-not-hyperscaler service. Even at that lower scale, queue backlog is a real concern during upload spikes. The solution is to split work at two levels: across videos (multiple workers consume from Kafka concurrently) and within a single video (segment-level parallelism using a DAG scheduler). Each 4-second chunk can be encoded independently, a 1-hour video has ~900 chunks, each encodeable in parallel.
Platforms like Netflix have built internal distributed encoding platforms (Netflix's internal system, called Cosmos) where encoding is broken into thousands of micro-tasks dispatched to a serverless fleet. A 2-hour movie might run on 10,000 concurrent compute nodes to complete encoding in minutes rather than hours.
Metadata DB shardingL5+›
A single PostgreSQL instance handles roughly 10K–20K QPS on reads with good indexing and
hardware. At 500M DAU each triggering 1–2 metadata reads, peak QPS exceeds this comfortably
, even with Redis shielding most requests, cache misses and writes accumulate. The natural
shard key is video_id (a UUID or hash-based ID), which distributes video rows
uniformly. A consistent hashing ring with 8–16 shards handles YouTube-scale metadata
workloads. Cross-shard queries (e.g. aggregates across all videos) are rare and can run on
read replicas with eventual consistency.
CDN cache warming for new videosL5+›
When a new video goes live, its segments are not cached anywhere, every early viewer hits CDN origin (the object store), which is both slower and more expensive. For ordinary content, this is fine: the CDN populates its cache organically as early viewers watch. For a high-profile release (major channel launch, live event VOD), a thundering herd of simultaneous viewers can overwhelm origin before the cache warms. The solution is proactive cache warming: once transcoding completes, a background job pre-fetches all segments at all quality levels to the CDN, establishing cache entries before the video is publicly visible. Most CDNs support an API for this.
Multi-region storage and failoverL6+›
Object stores like S3 are single-region by default. A region outage would make all segments in that region inaccessible. The durability NFR (11 nines) is about disk failure, not region failure, for the streaming availability NFR (99.99%), cross-region replication is required. Options: S3 Cross-Region Replication (asynchronous, minutes of lag), a natively multi-region object store (Cloudflare R2 distributes data across regions by default), or running dual S3 buckets in separate regions with application-level write fan-out on upload. The CDN's pull-on-miss should be configured to fail over to the secondary origin if the primary returns an error.
Storage tiering and cost optimizationL7+›
The long tail of video content follows a power law: roughly 20% of videos account for 80% of views. The bottom 50% of videos may never be watched again. Storing all of them in hot standard storage is expensive. Storage lifecycle policies can move videos older than 90 days with fewer than 100 views to cold storage (S3 Glacier, Nearline) at 70–90% cost reduction. Retrieval takes minutes, but for a video nobody has watched in months, a brief delay on re-access is acceptable. The most extreme approach, taken by some platforms, is to delete segments for content below a certain engagement threshold and only retain the raw master file, re-transcoding on demand if someone requests playback.
Failure modes & edge cases
| Scenario | Problem | Solution | Level |
|---|---|---|---|
| Upload connection drops mid-chunk | Partial chunk uploaded; client cannot determine which chunks succeeded | Multipart upload with per-chunk ETags: client tracks acknowledged ETags and resumes from the last unacknowledged chunk. S3 multipart upload natively supports this. | L4 |
| Transcoding worker crashes mid-job | Partially encoded segments written; Kafka message not acknowledged; job will be re-delivered | Workers write segments to a temp prefix and atomically move to the final prefix only on job completion. Re-delivery produces a clean re-run. Idempotent job design is essential. | L5 |
| Transcoding queue backlog spike | New uploads process slowly; creator sees "processing" for hours | Auto-scale transcoding worker fleet based on Kafka consumer group lag. Use spot VMs with graceful preemption handling (checkpoint current segment before shutdown). Set a max-processing-time alarm. | L5 |
| CDN edge node returns stale manifest | Viewer gets a manifest pointing to segments that don't exist yet (processing still ongoing) | Manifest is not pushed to CDN until all segments are encoded and confirmed in object store. Status field in metadata prevents manifest_url from being returned while status is 'processing'. | L5 |
| Redis counter service outage | View count increments are lost during outage window | View counts are eventually consistent by NFR, some loss is acceptable in steady state. On Redis recovery, counts resume from the last flushed value. For critical accuracy, write view events to a durable Kafka topic as the source of truth, with Redis as a derived projection. | L5 |
| Viral video: CDN thundering herd on publish | Millions of simultaneous viewers hit CDN origin before cache warms, overwhelming the object store | Proactive cache warming (§9): push all segments to CDN edge before making video public. For breaking events, delay public release by 30s to allow warming to complete. | L6 |
| Object store region outage | All video segments in that region become inaccessible; streaming fails globally | Cross-region replication for segments bucket. CDN configured to fail over to secondary origin. RTOs measured in minutes for cached content (already at edge); uncached content RTOs depend on replication lag. | L6 |
| A/B testing different encoding parameters | Changing segment length, bitrate ladder, or codec affects all existing manifests | Version manifests: store manifests at versioned paths (e.g. /v2/master.m3u8).
New uploads use v2; old content retains v1. Backfill re-encoding as a background job, gated
by cost analysis. |
L7/L8 |
How to answer by level
L3/L4Working end-to-end system›
- Identifies upload and playback as separate paths
- Mentions object storage for video files
- Knows videos must be transcoded to different qualities
- Mentions CDN for delivery
- Can sketch a basic schema with video metadata
- File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
- Treats upload and streaming as the same API surface
- No mention of adaptive bitrate, just "different quality buttons"
- Synchronous transcoding during upload request
- Serving video directly from app servers, not CDN
- File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
L5Tradeoffs and pipeline design›
- Explains ABR: segments, manifest, client-side quality selection
- Designs the async transcoding pipeline with Kafka decoupling
- Identifies view count as a separate write-heavy concern
- Knows the CDN caching model (pull-on-miss, TTL)
- Can discuss HLS vs MPEG-DASH tradeoffs
- File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
- Owns the encoding pipeline end-to-end, including failure handling
- Addresses hot-row problem on view counts with Redis buffer
- Considers multipart upload resumability
- Discusses CDN cache warming for viral content
- File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
L6Platform-level ownership›
- Designs segment-level parallel encoding for large videos
- Addresses cross-region storage replication for availability NFR
- Talks about metadata DB sharding strategy
- Proactive CDN cache warming for new popular content (an L5 topic; L6 goes deeper , CDN API integration, cost-gating logic, and warming only above a traffic threshold)
- Views the system as a product with creator and viewer personas
- File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
- Reasons about codec selection (AV1 vs H.264) and its cost implications
- Addresses manifest versioning and encoding parameter changes
- Considers ISP-level CDN co-location economics
- File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
L7/L8Cost, codec, and platform strategy›
- Drives codec decisions from unit economics: AV1 = 30-50% smaller files = massive egress savings at scale
- Considers storage lifecycle: hot/cold tiering by engagement
- Reasons about build vs. buy for CDN (Open Connect equivalent)
- Designs the ABR algorithm strategy, not just the infrastructure
- Addresses the analytics pipeline as a first-class concern
- File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
- Connects every architectural decision back to its cost or reliability implication
- Questions the requirements before designing (live vs. VOD has very different architecture)
- Thinks about creator tooling and the upload experience as a competitive differentiator
- File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
Classic interview probes
| Question | L3/L4 | L5/L6 | L7/L8 |
|---|---|---|---|
| "Why do we use adaptive bitrate instead of just picking a quality upfront?" | Network conditions change during playback, a fixed quality would cause buffering on speed drops. | Explains the segment model and manifest, client-side throughput measurement, and buffer health as inputs to the ABR algorithm. | Discusses per-title encoding (encoding ladder customised per video's visual complexity) and per-device ABR policy differences (mobile vs. smart TV). |
| "How would you handle a video that gets 10 million views in the first 5 minutes?" | CDN handles it, lots of servers serving the file. | CDN cache warming before publish, thundering herd prevention, and origin shield (CDN-to-origin fan-in reduction layer). | Pre-positioning strategy, co-location with major ISPs, and cost modeling for egress at 10M concurrent 1080p viewers (~40 Tbps). |
| "The transcoding queue is backed up, new uploads take 2 hours to become available. What do you do?" | Add more servers / workers to the transcoding fleet. | Auto-scale on Kafka consumer group lag, spot VMs with checkpoint-based preemption handling, and segment-level parallelism within a single video job. | Rethinks the encoding DAG, prioritizes short videos over long ones in the queue, and considers streaming transcoding (serve segments as they complete rather than waiting for the whole video). |
| "How would you add support for live streaming?" | Live streaming needs a different architecture, segments must be published in near real-time as the stream is captured, not after the full upload completes. An ingest server segments the incoming stream on-the-fly and pushes each segment to the CDN as it's ready. | Distinguishes live vs. VOD: live requires real-time segmentation and immediate CDN push, very low segment TTL (2–4s), and a different latency target (LL-HLS or WebRTC for low latency). | Addresses DVR (time-shift buffer), adaptive ingest from unstable encoder connections, multi-CDN failover for live events, and the architectural tension between latency and reliability in live delivery. |
How the pieces connect
Every architectural decision traces back to a requirement or a number. These are the key threads.
- Rate Limiter System Design, atomic Redis operations, distributed race conditions, and multi-tier quota enforcement
- URL Shortener System Design, hash encoding tradeoffs, database sharding strategies, and viral key mitigation
- Web Crawler System Design, Bloom filter deduplication, politeness throttling, and distributed frontier design
- Twitter/X Feed System Design, fan-out write amplification, hybrid push/pull strategy, and celebrity threshold design
- Notification Service System Design, multi-channel delivery, idempotency keys, and priority queues at scale
- Search Autocomplete System Design, Trie data structures, prefix caching, and read-heavy scale strategies
- Key-Value Store System Design, Consistent hashing, quorum consensus, and SSTable fundamentals
- Chat System (WhatsApp) System Design, WebSocket management, transient vs persistent storage, and read receipts
- Distributed Message Queue System Design, Kafka partition tuning, exactly-once delivery, and geo-replication
- File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
- Ride-Sharing System Design (Uber / Lyft) — geohashing, WebSocket-driven location tracking, and ETA prediction
- Payment Processing System Design — idempotency keys, exactly-once semantics, and append-only ledger models
- Top-K Leaderboard System Design — Redis sorted sets, approximate counting, and stream aggregation
- Airbnb Booking & Reservation System — inventory locks, double-booking prevention, and async elasticsearch sync
- Photo-Sharing Feed System Design — image pipelines, CDN delivery, and social graph scaling
- Proximity Search System Design (Yelp / Google Places) — geohash indexing, quadtree partitioning, and Bayesian review ranking
- Online Judge System Design — secure sandboxing, execution queues, and worker scaling