System Design Interview

YouTube System Design Interview Guide

Simple to describe, hard to scale, serving billions of minutes of video per day requires rethinking every layer of the stack.

L3/L4, Working system L5/L6, Encoding pipeline & CDN L7/L8, Global scale & cost ~22 min read
Architecture diagram of a YouTube-like video streaming system showing upload path, async transcoding workers, CDN edge caching, and playback flow
01

What the interviewer is testing

Video streaming comes up in FAANG interviews because it's genuinely hard in multiple directions at once: you're dealing with large binary assets, async processing pipelines, global delivery, and a client-side protocol most engineers have never had to think about. The "store files and serve them" framing candidates arrive with gets demolished pretty quickly.

Where level shows up most clearly: junior candidates know video needs different quality options; senior candidates understand why the file isn't served as a single blob and what ABR actually means in practice; staff and above think about the economics, CDN egress is a different cost category from database reads, and the architecture reflects that.

Level What good looks like
L3/L4 Can identify the main components (upload, storage, streaming), separate upload from viewing, and mention different video qualities.
L5 Understands adaptive bitrate, chunked upload resumability, CDN cache hierarchy, and async transcoding pipeline. Can reason about tradeoffs.
L6 Owns the encoding pipeline end-to-end, designs the metadata service and counter architecture, and addresses failure modes in the transcoding path.
L7/L8 Thinks about cost optimization (CDN egress, storage tiers), multi-region replication strategy, live vs. on-demand tradeoffs, and platform-level decisions around codec selection (AV1 vs. H.264).
02

Requirements clarification

Before drawing anything, separate the two subsystems: the upload and transcoding path (write-heavy, async, latency-tolerant) and the playback path (read-heavy, synchronous, latency-critical). Most design mistakes come from conflating them.

Functional requirements

Requirement Notes
Upload video Support large files (up to ~10 GB), resumable if interrupted
Transcode to multiple qualities 360p, 480p, 720p, 1080p, 4K, async after upload
Stream video adaptively Client selects quality dynamically based on bandwidth
Video metadata Title, description, tags, uploader, duration, thumbnail
Search & browse Search by title/tag, home feed, channel view, out of scope for this design
View count & engagement Like count, views, eventually consistent is acceptable

Non-functional requirements

Requirement Target
Playback start latency < 2s to first frame on good network
Rebuffering rate < 1% of sessions experience a stall
Upload availability 99.9%, a failed upload is recoverable by retrying
Streaming availability 99.99%, a playback interruption is a direct user experience failure
Transcoding latency Processing begins within 60s of upload completion; standard video available within 5 min
Storage durability 11 nines (same as S3 standard), data loss is unacceptable
View count accuracy Eventual consistency acceptable; counts may lag by 60s
Playback latency < 2sNFR reasoning

Two seconds to first frame is the boundary where users begin to perceive "slowness." Studies on streaming abandonment show a sharp cliff at 3s: roughly 10% of viewers leave for every additional second of startup delay. The 2s target drives the requirement for CDN edge caching (§8), without it, every playback request would traverse the full network path to origin, adding 100–500ms of latency globally.

What this drivesCDN edge node placement (§4), segment prefetching in ABR client (§5), and the need to pre-warm CDN cache for popular content (§8).
View count eventual consistencyNFR reasoning

View counts are inherently a high-write, low-precision metric. At 500M daily active users, if only 10% of sessions trigger a view count increment, that's ~580K writes/second during peak. Counting with strong consistency would require all those writes to hit a single row, creating a hot-spot. A 60-second lag is imperceptible to users and has no product-level consequence. This drives the Redis counter buffer + periodic flush design in §8.

What this drivesCounter service with Redis buffer (§7, §8), durable queue flush pipeline (§9).
Streaming availability 99.99% vs upload 99.9%NFR reasoning

These are intentionally asymmetric. An upload failure is a recoverable event, the uploader gets an error and retries. A playback failure during an active session is an irreversible experience failure: the user has already started watching. 99.99% means <52 minutes of downtime per year. 99.9% allows ~8.7 hours. This asymmetry justifies different infrastructure costs for the two paths, the streaming path warrants multi-region failover; the upload path does not.

What this drivesMulti-region CDN (§4), separate upload and streaming API surfaces (§4), and failure isolation between transcoding and delivery (§10).
03

Capacity estimation

Video systems are unusual in that storage and bandwidth dominate costs rather than compute or QPS. A single 1080p minute of video compresses to roughly 150 MB at H.264. Encoding all 5 quality tiers (360p through 4K) multiplies that by ~5.9×, 4K alone is roughly 4× the size of 1080p, and the lower tiers (720p, 480p, 360p) add another ~0.9× combined. CDN egress, not storage, is usually the biggest line item: a popular video served to 1M viewers costs more to deliver than to store.

Interactive Capacity Estimator

500M
40 min
500 h
4 Mbps

CDN egress bandwidth estimate loading…

04

High-level architecture

A video streaming platform separates two fundamentally different subsystems: the upload and transcoding path (write-heavy, asynchronous, latency-tolerant) and the playback path (read-heavy, synchronous, latency-critical). Producers upload raw video files; these are asynchronously encoded into multiple quality variants and distributed to CDN edge nodes. Viewers stream encoded segments directly from the nearest CDN edge, with the player selecting quality adaptively based on available bandwidth.

Sync Async UPLOAD PATH PLAYBACK PATH Creator Client Upload API chunked / resumable Object Store raw video Message Queue Kafka Object Store encoded segments Transcoding Workers (async) CDN Edge segments (cached) CDN Origin pulls from store Viewer Client + ABR API Gateway metadata / manifest Metadata DB PostgreSQL miss

Upload path (top): chunked upload → object store → async transcoding. CDN tier (middle): encoded segments flow down to CDN Origin; CDN Edge caches and serves. Playback path (bottom): viewer fetches manifest via API Gateway, then streams segments directly from CDN Edge.

Upload API orchestrates uploads without ever touching the video bytes. It issues pre-signed S3 multipart upload URLs to the client; the client uploads each chunk directly to S3, bypassing the API server entirely. Once the client calls /upload/complete with the per-chunk ETags, the API calls S3's CompleteMultipartUpload and publishes a transcoding job to the queue. This keeps the Upload API stateless and prevents it from becoming a bandwidth bottleneck, S3 absorbs all the byte transfer load.

Object Store holds two kinds of data: raw uploads awaiting transcoding, and the encoded segment files that are the actual playback artefacts. These are structurally separate prefixes within the same store. The raw bucket is ephemeral (files may be lifecycle-deleted after 30 days); the segments bucket is permanent and must be globally durable.

Message Queue (a durable log like Kafka) decouples the upload API from the transcoding workers. Without this, upload latency becomes coupled to transcoding capacity, a transcoding backlog would block new uploads. The queue also provides retry semantics: if a transcoding job fails, the message is not acknowledged and gets re-delivered.

Transcoding Workers consume jobs from the queue, pull raw video from the object store, run FFmpeg (or a custom encoder), and write the output segments back to the object store. CPU-bound and embarrassingly parallel, during peak upload hours you scale out the fleet; overnight you scale it back. Workers are stateless, so there's no coordination overhead.

Metadata DB stores video metadata (title, description, uploader, duration, manifest URL, processing status). This is a relational workload, a SQL database (PostgreSQL) is appropriate. The metadata record is created at upload time with status processing and updated to ready once transcoding completes.

CDN Edge (a content delivery network like Cloudflare or Akamai) serves video segments from points of presence near the viewer. The CDN pulls segments from the object store on first request and caches them at the edge. For popular videos, 95%+ of requests are served entirely from edge cache without touching origin.

API Gateway handles the viewer's two non-video requests: fetching video metadata and fetching the manifest file that describes available qualities and segment URLs. These are tiny JSON payloads, the gateway routes to the metadata service, which hits the database (or cache) and returns within milliseconds.

Architectural rationale

Separate upload and playback API surfacesisolation

Upload is write-heavy, latency-tolerant, and intermittently bursty (viral event → sudden upload spike). Playback is read-heavy, latency-critical, and needs 99.99% availability. Running them as separate services means a transcoding backlog or upload service incident cannot degrade streaming. This directly addresses the asymmetric NFRs from §2.

TradeoffTwo API surfaces to operate, but the failure isolation is worth it. Uploads can degrade gracefully; playback must not.
Alternativeunified gateway with upload/stream routes, simpler to start with, but creates coupling at the wrong layer.
Async transcoding via message queuedecoupling

Transcoding a 1-hour video takes minutes even on fast hardware, far too slow to do synchronously in an upload request. The queue (Kafka) acts as a buffer: upload completes instantly, transcoding happens when workers are available. The queue also provides exactly-once delivery semantics using consumer group offsets, preventing duplicate processing.

TradeoffAdds operational complexity (Kafka cluster). A simpler alternative for early stage is a job queue backed by PostgreSQL (advisory locks) or Redis (RPOPLPUSH). Kafka becomes necessary at multi-thousand jobs/minute scale.
CDN-first delivery, segments never served from origincost + latency

Video segments are immutable once encoded, a given segment URL will always return the same bytes. This is ideal for CDN caching: a segment cached at an edge node serves unlimited viewers without touching origin. Without CDN, a single viral video could saturate an entire datacenter's egress. The CDN doesn't just improve latency, it makes the unit economics of video streaming viable.

TradeoffCDN egress is still the largest cost line. At scale, platforms push this further by co-locating with ISPs (Open Connect for Netflix) to serve from within the ISP's network, eliminating CDN cost entirely for top-tier traffic.

Real-world comparison

Decision This design YouTube Netflix
Streaming protocol HLS or MPEG-DASH MPEG-DASH (custom) MPEG-DASH
Encoding FFmpeg workers on Kafka jobs Spanner-coordinated transcoding Cosmos (distributed encoding)
CDN delivery Third-party CDN (Cloudflare/Akamai) Google's own CDN infrastructure Open Connect (ISP-embedded)
Primary codec H.264 (widest compat) VP9 / AV1 H.264 / H.265 / AV1
Metadata store PostgreSQL Bigtable + Spanner Cassandra + MySQL
💡

Every major platform eventually builds custom infrastructure around CDN delivery because egress is the dominant cost. The decisions cascade from codec (AV1 compresses ~30% better than H.264, which translates directly to egress savings at scale) through to delivery architecture. YouTube and Netflix chose differently from each other, and both chose differently from this design, because their scale, device targets, and cost structures differ.

05

Adaptive bitrate streaming

Adaptive bitrate (ABR) streaming is a video delivery technique where the source video is cut into short segments (typically 4–6 seconds) and encoded at multiple quality levels — for example 360p, 720p, 1080p, and 4K. A manifest file (HLS .m3u8 or DASH .mpd) lists all available quality variants and their segment URLs. The client player measures current network bandwidth and buffer fill, then requests each segment at the highest quality it can sustain without rebuffering. The server’s job is to make all versions available; the adaptation logic lives entirely in the player.

Most distributed systems serve one canonical response to a request. Video streaming serves dozens of versions of the same content, the client picks which one to fetch next, segment by segment, based on what its network can handle right now. The server's job is to make all the versions available; the adaptation logic lives entirely in the player.

The mechanism: cut the video into short segments (4–6 seconds each), encode each segment at multiple quality levels, and publish a manifest listing all the options. The client downloads the next segment at whatever quality its measured throughput can sustain. Because it's only committing 4–6 seconds ahead, it can step down quality quickly when the network degrades, without making the viewer wait for a full rebuffer.

Source raw upload Segmenter 4–6s chunks Parallel encoding workers Encode 4K / 15 Mbps Encode 1080p / 4 Mbps Encode 720p / 2.5 Mbps Encode 480p / 1 Mbps Encode 360p / 0.5 Mbps Manifest .m3u8 / .mpd ABR Client measures bandwidth monitors buffer health selects quality per segment

A single source video produces 5 quality variants. The manifest lists all variants; the ABR client picks which to request, segment by segment, based on measured bandwidth and buffer state.

The two standard protocols: HLS and MPEG-DASH

HLS (HTTP Live Streaming) MPEG-DASH
Origin Apple (2009) MPEG consortium (2012)
Manifest format .m3u8 (plain text) .mpd (XML)
Segment format MPEG-TS or fMP4 fMP4 (fragmented)
Native device support All Apple; most modern browsers Android, Smart TVs, browsers (via JS library)
Codec flexibility Limited, H.264/H.265 primary Any codec (AV1, VP9, H.265)
Royalties Apple patents Royalty-free

Our choice for this design is HLS with fMP4 segments. HLS offers the broadest device compatibility (including all Apple devices) and fMP4 segments are compatible with most DASH players, giving us the option to serve both protocols from the same segment files. For interview purposes, either choice is fine, the important thing is explaining the tradeoff.

ABR algorithm internalsadvanced

The ABR algorithm runs on the client and makes a quality decision before fetching each segment. The simplest algorithm (throughput-based) measures the download speed of the last N segments and picks the highest bitrate that fits within 80% of that speed. More sophisticated algorithms (BOLA, MPC) consider buffer occupancy: if the buffer is draining fast, step down quality; if the buffer is full, it's safe to step up. Netflix's Pensieve (2017) used reinforcement learning to train per-network-condition ABR policies, an example of where L7-level thinking applies.

Interview signalKnowing that the ABR algorithm is a client-side concern, not a server-side one, is a useful differentiator. The server just needs to make all quality levels available; the intelligence of adaptation lives in the player.
🎯

Common follow-up: "What segment length should we use?" Shorter segments (2s) give faster adaptation but more HTTP overhead. Longer segments (10s) are more efficient but adapt slowly to network changes. Most platforms converge on 4–6 seconds as the sweet spot. Segment length is fixed at encoding time and encoded into the manifest.

05b

API design

POST /videos/upload/init, Initiate chunked upload

// Request
{
  "filename": "vacation.mp4",
  "file_size_bytes": 2147483648,  // 2 GB
  "content_type": "video/mp4",
  "title": "Summer Vacation 2025",
  "description": "..."
}

// Response 201 Created
{
  "upload_id": "upl_8f3k2j9...",
  "video_id": "vid_4x2k9...",
  "chunk_urls": [           // pre-signed S3 multipart URLs
    { "part_number": 1, "url": "https://s3.../..." },
    // ...
  ],
  "chunk_size_bytes": 5242880  // 5 MB chunks
}

The upload API returns pre-signed S3 multipart upload URLs. The client uploads each chunk directly to S3 , the upload API server never touches the video bytes, only the metadata. This prevents the API server from becoming a bandwidth bottleneck and offloads the heavy lifting to S3's durable storage layer.

🔒

Auth and abuse prevention (don't skip this in an interview): The upload init endpoint must require authentication (OAuth token or API key), unauthenticated upload would allow anyone to fill your object store. Rate-limit per creator (e.g. 10 uploads/hour) to prevent abuse. Validate content_type server-side before issuing pre-signed URLs, only issue URLs for video MIME types. File type validation beyond MIME type (magic bytes check) happens after the upload lands in the raw bucket, before the transcoding job is enqueued.

POST /videos/upload/complete, Finalize and trigger transcoding

// Request
{
  "upload_id": "upl_8f3k2j9...",
  "parts": [
    { "part_number": 1, "etag": "\"abc123\"" },
    // ...
  ]
}

// Response 202 Accepted
{
  "video_id": "vid_4x2k9...",
  "status": "processing",
  "estimated_ready_at": "2025-06-01T12:05:00Z"
}

GET /videos/:id, Fetch video metadata and manifest

// Response 200 OK
{
  "video_id": "vid_4x2k9...",
  "title": "Summer Vacation 2025",
  "uploader": { "id": "usr_...", "name": "Alice" },
  "duration_seconds": 342,
  "view_count": 18420,
  "status": "ready",           // or "processing" | "failed"
  "manifest_url": "https://cdn.example.com/videos/vid_4x2k9/master.m3u8",
  "thumbnail_url": "https://cdn.example.com/thumbs/vid_4x2k9.jpg"
}
Endpoint Level Notes
DELETE /videos/:id L5 Soft delete, mark metadata as deleted, CDN cache TTL expires naturally. Hard delete from object store is async.
GET /videos/:id/captions L5 Returns available caption tracks (WebVTT format). Auto-generated via async ASR pipeline.
POST /videos/:id/views L5 Increments view counter. Idempotent per session via session token dedupe in Redis.
GET /videos/:id/analytics L7/L8 Per-segment watch-time heatmap. Requires separate analytics pipeline, a columnar store (ClickHouse or BigQuery).
🖼️

Thumbnail generation: Thumbnails are extracted as a lightweight step inside the transcoding pipeline. After the first pass of encoding completes, FFmpeg extracts a frame at the 10% mark of the video duration and stores it to the object store (e.g. thumbs/{video_id}.jpg). The metadata record's thumbnail_url is populated at the same time the manifest_url is set, so both become available atomically when status transitions to ready.

06

Core flows, upload & playback

Upload flow

The latency NFR for uploads is relaxed, users expect processing to take minutes, not milliseconds. The primary requirements are reliability (no data loss) and resumability (a dropped connection should not require re-uploading 2 GB). These requirements together drive the multipart upload pattern, where each chunk is independently acknowledged before the next is sent.

Client Upload API Object Store Kafka + Workers POST /upload/init 201 + pre-signed chunk URLs PUT chunk directly to S3 (×N) 200 + ETag per chunk POST /upload/complete + ETags CompleteMultipartUpload Publish transcode job 202 Accepted, status=processing

Client uploads chunks directly to S3, bypassing the Upload API for byte transfer. API only handles metadata and orchestration. Transcoding is triggered asynchronously.

Playback flow

The playback latency NFR (<2s to first frame, from §2) drives the key design decision here: the manifest and first segment must be available from CDN edge before the player starts buffering. The manifest file is tiny (~1 KB) but is fetched on every play, it goes through the API gateway, which checks metadata DB for the manifest URL and returns it. The segments themselves are fetched directly by the player from the CDN edge URL embedded in the manifest.

💡

Why this matters for latency: The manifest round-trip through the API gateway adds ~50–100ms latency (fast, cached). If the first CDN edge segment is already cached, the player can start rendering within 200–400ms of playback start. If it needs to fetch from origin (cold cache), add another 100–500ms depending on geography. The CDN cache hit rate for popular content is typically >95%, making cold-cache paths rare in practice.

07

Data model

The video platform has three distinct data domains with very different access patterns. The access patterns are worth looking at first, they're what determine the schema, not the other way around.

Operation Frequency Query shape
Fetch video metadata by ID Very high (every playback) Point lookup by video_id
List videos by uploader High (channel page) Range scan by uploader_id
Increment view count Very high (every play event) Counter increment by video_id
Update processing status Low (once per transcoding job) Point update by video_id
Fetch user profile High Point lookup by user_id

Two things stand out from this table. First, the dominant read pattern is a point lookup by video_id, this is extremely cache-friendly and drives the Redis metadata cache in §8. Second, view count increments are so frequent that they cannot go directly to the primary database without creating a hot-spot on popular video rows; they need their own buffered write path.

Schema

-- Core video record
CREATE TABLE videos (
  video_id      VARCHAR(16)  PRIMARY KEY,
  uploader_id   VARCHAR(16)  NOT NULL REFERENCES users(user_id),
  title         TEXT         NOT NULL,
  description   TEXT,
  duration_sec  INTEGER,
  status        VARCHAR(16)  NOT NULL DEFAULT 'processing',
                             -- 'processing' | 'ready' | 'failed'
  manifest_url  TEXT,        -- null until transcoding completes
  thumbnail_url TEXT,
  created_at    TIMESTAMPTZ  NOT NULL DEFAULT NOW()
);

-- View counts live separately, write-heavy, eventually consistent
-- Managed by counter service; flushed here periodically from Redis
CREATE TABLE video_stats (
  video_id      VARCHAR(16)  PRIMARY KEY REFERENCES videos(video_id),
  view_count    BIGINT       NOT NULL DEFAULT 0,
  like_count    BIGINT       NOT NULL DEFAULT 0,
  updated_at    TIMESTAMPTZ  NOT NULL DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_videos_uploader ON videos(uploader_id, created_at DESC);
CREATE INDEX idx_videos_status   ON videos(status) WHERE status = 'processing';
video_stats as a separate tablehot-spot prevention

Popular videos can receive thousands of view events per second. If view_count lived as a column in the videos table, every increment would lock that row, creating severe write contention. Separating it into video_stats doesn't solve the hot-row problem alone, but combined with the Redis buffer pattern in §8, it means the database only receives periodic bulk flushes rather than per-event increments. The updated_at column lets us confirm flush recency.

TradeoffReads that need both metadata and view count must join two tables. In practice, the metadata service fetches from Redis (which holds both), so this join is rare, it only occurs on cache miss.
status field on the videos tablepolling simplicity

After upload, the client can poll GET /videos/:id to check when status transitions from processing to ready. This is simpler than setting up a websocket or webhook for most upload clients. The partial index on (status) WHERE status = 'processing' makes it efficient to query all in-flight jobs from the monitoring dashboard without a full table scan.

08

Caching strategy

Video streaming uses three distinct caching layers: a CDN edge cache (stores immutable video segments close to viewers, TTL 7 days), a Redis metadata cache (stores serialized video JSON per video_id, handles ~1M QPS, TTL 10 minutes), and a Redis counter buffer (accumulates view and like increments in memory, flushed to PostgreSQL every 60 seconds). The CDN layer is what makes the economics of video streaming viable — at peak, 95%+ of segment requests are served from edge cache without touching origin.

A video uploaded once might be watched ten million times. That read/write ratio shapes every caching decision, and at this scale, caching isn't optional, it's what makes the cost model work at all. There are three distinct layers, each addressing a different part of the request path.

① CDN Edge Cache Video segments, thumbnails TTL: 7 days (immutable) ② Redis Metadata Cache Video metadata JSON TTL: 10 min, LRU eviction ③ Redis Counter Buffer View/like counts (atomic) Flush: every 60s → DB PostgreSQL (origin) Video metadata + stats Object Store (CDN origin) miss miss/flush flush

Three caching layers: CDN edge for video bytes, Redis for metadata JSON, Redis for write-buffered view counters. Each layer is positioned in the §4 request path.

Layer What it caches Why it exists Invalidation
① CDN Edge Video segments (.ts / .m4s), thumbnails, manifests Latency NFR from §2: <2s to first frame requires edge proximity. Also the only way to serve massive concurrent viewers economically. Segments are content-addressed and immutable, TTL of 7 days, no explicit invalidation needed. Manifests TTL 5 min (status may change).
② Redis Metadata Serialised video metadata JSON per video_id Every playback triggers a metadata fetch. DB can handle ~10K QPS; Redis handles ~1M QPS , essential at scale. Explicit invalidation on any write: title edits, description changes, and, most critically , when status transitions from processing to ready (this is when manifest_url becomes non-null and must be visible to viewers immediately). TTL of 10 min as safety net. Write-through on update.
③ Redis Counter In-memory atomic counters: views:{video_id}, likes:{video_id} View counts are write-heavy, direct DB increments at scale cause hot-row lock contention. Redis INCR is O(1) and lock-free. Not a cache, it's the write buffer. A background job periodically flushes deltas to PostgreSQL using UPDATE video_stats SET view_count = view_count + $delta.
Counter flush strategy and accuracyL5+

The flush job runs every 60 seconds per the eventual consistency NFR from §2. It captures each counter's current value and resets it to 0 atomically, then writes the delta to PostgreSQL. GETSET was deprecated in Redis 6.2 and removed in 7.0, the correct approach is a Lua script, which Redis executes atomically:

-- Atomically capture and reset a counter.
-- Returns the previous value as a string, or nil if the key didn't exist
-- (i.e. no views since the last flush). Caller must handle nil.
local val = redis.call('GET', KEYS[1])
if val then redis.call('SET', KEYS[1], '0') end
return val  -- nil means delta = 0; skip the DB write

The nil return matters: if a video received no views since the last flush, the key won't exist and the script returns nil. A flush job that doesn't handle nil will throw a null-pointer exception for every cold video in the counter keyspace. The fix is simple , skip the DB write when val is nil. A Redis failure mid-flush could still cause a double-count on the next flush (counter reset but DB write not committed); the eventual consistency NFR explicitly permits this, and the magnitude is bounded by 60 seconds of view events.

At YouTube scaleView counting moves to a dedicated pipeline: events go through a durable log (Kafka) → a stream processing engine (Apache Flink or Google Dataflow) for aggregation → periodic bulk upsert to the stats store. This handles deduplication (same user watching repeatedly), bot filtering, and replay analysis, none of which are possible in a simple Redis INCR.
09

Deep-dive scalability

The following scalability problems don't all arise at the same time. A service at 1M daily users looks different from one at 100M. These are the problems that emerge as you scale, roughly in the order they bite you.

Transcoding pipeline parallelismL5+

A naive transcoding worker processes a video sequentially, segment by segment, quality by quality. YouTube processes roughly 500 hours of video per minute, note the estimator defaults to 500 hours per day, about 0.6% of YouTube's actual ingest volume, intentionally scoped to a large-but-not-hyperscaler service. Even at that lower scale, queue backlog is a real concern during upload spikes. The solution is to split work at two levels: across videos (multiple workers consume from Kafka concurrently) and within a single video (segment-level parallelism using a DAG scheduler). Each 4-second chunk can be encoded independently, a 1-hour video has ~900 chunks, each encodeable in parallel.

Platforms like Netflix have built internal distributed encoding platforms (Netflix's internal system, called Cosmos) where encoding is broken into thousands of micro-tasks dispatched to a serverless fleet. A 2-hour movie might run on 10,000 concurrent compute nodes to complete encoding in minutes rather than hours.

Practical starting pointA Kafka consumer group with N workers per quality tier, using spot/preemptible VMs for cost. Each worker handles one video at a time, encoding all qualities sequentially. Add segment-level parallelism only when queue depth consistently exceeds target latency.
Metadata DB shardingL5+

A single PostgreSQL instance handles roughly 10K–20K QPS on reads with good indexing and hardware. At 500M DAU each triggering 1–2 metadata reads, peak QPS exceeds this comfortably , even with Redis shielding most requests, cache misses and writes accumulate. The natural shard key is video_id (a UUID or hash-based ID), which distributes video rows uniformly. A consistent hashing ring with 8–16 shards handles YouTube-scale metadata workloads. Cross-shard queries (e.g. aggregates across all videos) are rare and can run on read replicas with eventual consistency.

AlternativeCassandra or Bigtable as the metadata store, naturally distributed, good for high-QPS point lookups. Tradeoff: no joins, no transactions, operational complexity. At 100M+ DAU this becomes the right choice. Below that, sharded PostgreSQL is operationally simpler.
CDN cache warming for new videosL5+

When a new video goes live, its segments are not cached anywhere, every early viewer hits CDN origin (the object store), which is both slower and more expensive. For ordinary content, this is fine: the CDN populates its cache organically as early viewers watch. For a high-profile release (major channel launch, live event VOD), a thundering herd of simultaneous viewers can overwhelm origin before the cache warms. The solution is proactive cache warming: once transcoding completes, a background job pre-fetches all segments at all quality levels to the CDN, establishing cache entries before the video is publicly visible. Most CDNs support an API for this.

Cost considerationCache warming incurs CDN transfer fees on every edge node you warm. For non-viral content, this is wasteful, the organic approach is more cost-efficient. Target warming at content expected to receive >100K views in the first hour.
Multi-region storage and failoverL6+

Object stores like S3 are single-region by default. A region outage would make all segments in that region inaccessible. The durability NFR (11 nines) is about disk failure, not region failure, for the streaming availability NFR (99.99%), cross-region replication is required. Options: S3 Cross-Region Replication (asynchronous, minutes of lag), a natively multi-region object store (Cloudflare R2 distributes data across regions by default), or running dual S3 buckets in separate regions with application-level write fan-out on upload. The CDN's pull-on-miss should be configured to fail over to the secondary origin if the primary returns an error.

Storage tiering and cost optimizationL7+

The long tail of video content follows a power law: roughly 20% of videos account for 80% of views. The bottom 50% of videos may never be watched again. Storing all of them in hot standard storage is expensive. Storage lifecycle policies can move videos older than 90 days with fewer than 100 views to cold storage (S3 Glacier, Nearline) at 70–90% cost reduction. Retrieval takes minutes, but for a video nobody has watched in months, a brief delay on re-access is acceptable. The most extreme approach, taken by some platforms, is to delete segments for content below a certain engagement threshold and only retain the raw master file, re-transcoding on demand if someone requests playback.

10

Failure modes & edge cases

Scenario Problem Solution Level
Upload connection drops mid-chunk Partial chunk uploaded; client cannot determine which chunks succeeded Multipart upload with per-chunk ETags: client tracks acknowledged ETags and resumes from the last unacknowledged chunk. S3 multipart upload natively supports this. L4
Transcoding worker crashes mid-job Partially encoded segments written; Kafka message not acknowledged; job will be re-delivered Workers write segments to a temp prefix and atomically move to the final prefix only on job completion. Re-delivery produces a clean re-run. Idempotent job design is essential. L5
Transcoding queue backlog spike New uploads process slowly; creator sees "processing" for hours Auto-scale transcoding worker fleet based on Kafka consumer group lag. Use spot VMs with graceful preemption handling (checkpoint current segment before shutdown). Set a max-processing-time alarm. L5
CDN edge node returns stale manifest Viewer gets a manifest pointing to segments that don't exist yet (processing still ongoing) Manifest is not pushed to CDN until all segments are encoded and confirmed in object store. Status field in metadata prevents manifest_url from being returned while status is 'processing'. L5
Redis counter service outage View count increments are lost during outage window View counts are eventually consistent by NFR, some loss is acceptable in steady state. On Redis recovery, counts resume from the last flushed value. For critical accuracy, write view events to a durable Kafka topic as the source of truth, with Redis as a derived projection. L5
Viral video: CDN thundering herd on publish Millions of simultaneous viewers hit CDN origin before cache warms, overwhelming the object store Proactive cache warming (§9): push all segments to CDN edge before making video public. For breaking events, delay public release by 30s to allow warming to complete. L6
Object store region outage All video segments in that region become inaccessible; streaming fails globally Cross-region replication for segments bucket. CDN configured to fail over to secondary origin. RTOs measured in minutes for cached content (already at edge); uncached content RTOs depend on replication lag. L6
A/B testing different encoding parameters Changing segment length, bitrate ladder, or codec affects all existing manifests Version manifests: store manifests at versioned paths (e.g. /v2/master.m3u8). New uploads use v2; old content retains v1. Backfill re-encoding as a background job, gated by cost analysis. L7/L8
11

How to answer by level

L3/L4Working end-to-end system
What good looks like
  • Identifies upload and playback as separate paths
  • Mentions object storage for video files
  • Knows videos must be transcoded to different qualities
  • Mentions CDN for delivery
  • Can sketch a basic schema with video metadata
  • File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
Common gaps at L3/L4
  • Treats upload and streaming as the same API surface
  • No mention of adaptive bitrate, just "different quality buttons"
  • Synchronous transcoding during upload request
  • Serving video directly from app servers, not CDN
  • File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
L5Tradeoffs and pipeline design
What good looks like
  • Explains ABR: segments, manifest, client-side quality selection
  • Designs the async transcoding pipeline with Kafka decoupling
  • Identifies view count as a separate write-heavy concern
  • Knows the CDN caching model (pull-on-miss, TTL)
  • Can discuss HLS vs MPEG-DASH tradeoffs
  • File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
Separates L5 from L6
  • Owns the encoding pipeline end-to-end, including failure handling
  • Addresses hot-row problem on view counts with Redis buffer
  • Considers multipart upload resumability
  • Discusses CDN cache warming for viral content
  • File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
L6Platform-level ownership
What good looks like
  • Designs segment-level parallel encoding for large videos
  • Addresses cross-region storage replication for availability NFR
  • Talks about metadata DB sharding strategy
  • Proactive CDN cache warming for new popular content (an L5 topic; L6 goes deeper , CDN API integration, cost-gating logic, and warming only above a traffic threshold)
  • Views the system as a product with creator and viewer personas
  • File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
Separates L6 from L7
  • Reasons about codec selection (AV1 vs H.264) and its cost implications
  • Addresses manifest versioning and encoding parameter changes
  • Considers ISP-level CDN co-location economics
  • File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
L7/L8Cost, codec, and platform strategy
What good looks like
  • Drives codec decisions from unit economics: AV1 = 30-50% smaller files = massive egress savings at scale
  • Considers storage lifecycle: hot/cold tiering by engagement
  • Reasons about build vs. buy for CDN (Open Connect equivalent)
  • Designs the ABR algorithm strategy, not just the infrastructure
  • Addresses the analytics pipeline as a first-class concern
  • File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
Distinguishing behaviours
  • Connects every architectural decision back to its cost or reliability implication
  • Questions the requirements before designing (live vs. VOD has very different architecture)
  • Thinks about creator tooling and the upload experience as a competitive differentiator
  • File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication

Classic interview probes

Question L3/L4 L5/L6 L7/L8
"Why do we use adaptive bitrate instead of just picking a quality upfront?" Network conditions change during playback, a fixed quality would cause buffering on speed drops. Explains the segment model and manifest, client-side throughput measurement, and buffer health as inputs to the ABR algorithm. Discusses per-title encoding (encoding ladder customised per video's visual complexity) and per-device ABR policy differences (mobile vs. smart TV).
"How would you handle a video that gets 10 million views in the first 5 minutes?" CDN handles it, lots of servers serving the file. CDN cache warming before publish, thundering herd prevention, and origin shield (CDN-to-origin fan-in reduction layer). Pre-positioning strategy, co-location with major ISPs, and cost modeling for egress at 10M concurrent 1080p viewers (~40 Tbps).
"The transcoding queue is backed up, new uploads take 2 hours to become available. What do you do?" Add more servers / workers to the transcoding fleet. Auto-scale on Kafka consumer group lag, spot VMs with checkpoint-based preemption handling, and segment-level parallelism within a single video job. Rethinks the encoding DAG, prioritizes short videos over long ones in the queue, and considers streaming transcoding (serve segments as they complete rather than waiting for the whole video).
"How would you add support for live streaming?" Live streaming needs a different architecture, segments must be published in near real-time as the stream is captured, not after the full upload completes. An ingest server segments the incoming stream on-the-fly and pushes each segment to the CDN as it's ready. Distinguishes live vs. VOD: live requires real-time segmentation and immediate CDN push, very low segment TTL (2–4s), and a different latency target (LL-HLS or WebRTC for low latency). Addresses DVR (time-shift buffer), adaptive ingest from unstable encoder connections, multi-CDN failover for live events, and the architectural tension between latency and reliability in live delivery.

How the pieces connect

Every architectural decision traces back to a requirement or a number. These are the key threads.

1
Playback latency NFR < 2s (§2) video must be available at edge before viewer requests it CDN edge caching with pull-on-miss + proactive warming for popular content (§4, §8, §9)
2
Transcoding takes minutes (§3 estimation) upload must not block on encoding async decoupling via Kafka message queue (§4), upload returns 202 Accepted immediately
3
View count eventual consistency acceptable (§2 NFR) direct DB increments create hot-row lock contention at scale Redis INCR buffer with periodic flush to PostgreSQL (§7, §8)
4
Streaming availability 99.99% vs upload 99.9% (§2 NFR) upload and streaming must be failure-isolated separate API surfaces; a transcoding backlog cannot degrade active playback (§4)
5
Variable viewer bandwidth (real-world constraint) fixed quality causes buffering on slow networks ABR streaming: segment encoding at multiple bitrates, client-side quality selection per-segment (§5)
6
Massive read/write ratio (§3: ~500M viewers, ~500h/day uploads) metadata DB cannot absorb every playback as a read Redis metadata cache with write-through invalidation; DB only on cache miss (§8)

System Design Mock Interviews

AI-powered system design practice with real-time feedback on your architecture and tradeoff reasoning.

Coming Soon

Practice Coding Interviews Now

Get instant feedback on your approach, communication, and code — powered by AI.

Start a Coding Mock Interview →
Also in this series