Why do video streaming platforms use adaptive bitrate instead of a fixed quality?

Network conditions change during playback. A fixed quality would cause rebuffering when bandwidth drops. Adaptive bitrate (ABR) cuts the video into 4-6 second segments encoded at multiple quality levels. Before each segment fetch, the client measures download speed and buffer fill, then requests the highest quality it can sustain without stalling.

How does a system like YouTube handle a video that gets 10 million views in the first 5 minutes?

CDN edge caching is the primary defense. Once transcoding completes, a background job proactively pre-fetches all segments at all quality levels to CDN edge nodes before the video is made public. This cache warming prevents a thundering herd from hitting CDN origin simultaneously. For breaking events, the video's public release may be delayed by 30 seconds to allow edge warming to complete.

Why is video transcoding done asynchronously rather than during the upload request?

Transcoding a 1-hour video takes minutes even on fast hardware, far too slow for a synchronous request. The upload API returns 202 Accepted immediately after the file lands in object storage, then publishes a job to a message queue (Kafka). Workers consume the queue asynchronously, encoding all quality tiers in parallel. This means transcoding backlog never blocks new uploads.

How do you count video views at YouTube scale without database hot-row contention?

Direct database increments per view would create severe write contention on popular video rows. The solution is a Redis counter buffer: each view event calls Redis INCR on a per-video key (O(1), lock-free). A background job flushes the accumulated delta to PostgreSQL every 60 seconds using a Lua script for atomic capture-and-reset. View counts lag by up to 60 seconds, which is acceptable per the eventual consistency NFR.

System Design Interview

YouTube System Design Interview Guide

Simple to describe, hard to scale, serving billions of minutes of video per day requires rethinking every layer of the stack.

L3/L4, Working system L5/L6, Encoding pipeline & CDN L7/L8, Global scale & cost ~22 min read

Architecture diagram of a YouTube-like video streaming system showing upload path, async transcoding workers, CDN edge caching, and playback flow

What the interviewer is testing

Video streaming comes up in FAANG interviews because it's genuinely hard in multiple directions at once: you're dealing with large binary assets, async processing pipelines, global delivery, and a client-side protocol most engineers have never had to think about. The "store files and serve them" framing candidates arrive with gets demolished pretty quickly.

Where level shows up most clearly: junior candidates know video needs different quality options; senior candidates understand why the file isn't served as a single blob and what ABR actually means in practice; staff and above think about the economics, CDN egress is a different cost category from database reads, and the architecture reflects that.

Level	What good looks like
L3/L4	Can identify the main components (upload, storage, streaming), separate upload from viewing, and mention different video qualities.
L5	Understands adaptive bitrate, chunked upload resumability, CDN cache hierarchy, and async transcoding pipeline. Can reason about tradeoffs.
L6	Owns the encoding pipeline end-to-end, designs the metadata service and counter architecture, and addresses failure modes in the transcoding path.
L7/L8	Thinks about cost optimization (CDN egress, storage tiers), multi-region replication strategy, live vs. on-demand tradeoffs, and platform-level decisions around codec selection (AV1 vs. H.264).

Requirements clarification

Before drawing anything, separate the two subsystems: the upload and transcoding path (write-heavy, async, latency-tolerant) and the playback path (read-heavy, synchronous, latency-critical). Most design mistakes come from conflating them.

Functional requirements

Requirement	Notes
Upload video	Support large files (up to ~10 GB), resumable if interrupted
Transcode to multiple qualities	360p, 480p, 720p, 1080p, 4K, async after upload
Stream video adaptively	Client selects quality dynamically based on bandwidth
Video metadata	Title, description, tags, uploader, duration, thumbnail
Search & browse	Search by title/tag, home feed, channel view, out of scope for this design
View count & engagement	Like count, views, eventually consistent is acceptable

Non-functional requirements

Requirement	Target
Playback start latency	< 2s to first frame on good network
Rebuffering rate	< 1% of sessions experience a stall
Upload availability	99.9%, a failed upload is recoverable by retrying
Streaming availability	99.99%, a playback interruption is a direct user experience failure
Transcoding latency	Processing begins within 60s of upload completion; standard video available within 5 min
Storage durability	11 nines (same as S3 standard), data loss is unacceptable
View count accuracy	Eventual consistency acceptable; counts may lag by 60s

Playback latency < 2sNFR reasoning›

Two seconds to first frame is the boundary where users begin to perceive "slowness." Studies on streaming abandonment show a sharp cliff at 3s: roughly 10% of viewers leave for every additional second of startup delay. The 2s target drives the requirement for CDN edge caching (§8), without it, every playback request would traverse the full network path to origin, adding 100–500ms of latency globally.

What this drivesCDN edge node placement (§4), segment prefetching in ABR client (§5), and the need to pre-warm CDN cache for popular content (§8).

View count eventual consistencyNFR reasoning›

View counts are inherently a high-write, low-precision metric. At 500M daily active users, if only 10% of sessions trigger a view count increment, that's ~580K writes/second during peak. Counting with strong consistency would require all those writes to hit a single row, creating a hot-spot. A 60-second lag is imperceptible to users and has no product-level consequence. This drives the Redis counter buffer + periodic flush design in §8.

What this drivesCounter service with Redis buffer (§7, §8), durable queue flush pipeline (§9).

Streaming availability 99.99% vs upload 99.9%NFR reasoning›

These are intentionally asymmetric. An upload failure is a recoverable event, the uploader gets an error and retries. A playback failure during an active session is an irreversible experience failure: the user has already started watching. 99.99% means <52 minutes of downtime per year. 99.9% allows ~8.7 hours. This asymmetry justifies different infrastructure costs for the two paths, the streaming path warrants multi-region failover; the upload path does not.

What this drivesMulti-region CDN (§4), separate upload and streaming API surfaces (§4), and failure isolation between transcoding and delivery (§10).

Capacity estimation

Video systems are unusual in that storage and bandwidth dominate costs rather than compute or QPS. A single 1080p minute of video compresses to roughly 150 MB at H.264. Encoding all 5 quality tiers (360p through 4K) multiplies that by ~5.9×, 4K alone is roughly 4× the size of 1080p, and the lower tiers (720p, 480p, 360p) add another ~0.9× combined. CDN egress, not storage, is usually the biggest line item: a popular video served to 1M viewers costs more to deliver than to store.

Interactive Capacity Estimator

Daily active users (M) 500M

Avg watch time / user (min/day) 40 min

New uploads / day (hours) 500 h

Avg video bitrate 1080p (Mbps) 4 Mbps

⚡

CDN egress bandwidth estimate loading…

High-level architecture

A video streaming platform separates two fundamentally different subsystems: the upload and transcoding path (write-heavy, asynchronous, latency-tolerant) and the playback path (read-heavy, synchronous, latency-critical). Producers upload raw video files; these are asynchronously encoded into multiple quality variants and distributed to CDN edge nodes. Viewers stream encoded segments directly from the nearest CDN edge, with the player selecting quality adaptively based on available bandwidth.

Upload path (top): chunked upload → object store → async transcoding. CDN tier (middle): encoded segments flow down to CDN Origin; CDN Edge caches and serves. Playback path (bottom): viewer fetches manifest via API Gateway, then streams segments directly from CDN Edge.

Upload API orchestrates uploads without ever touching the video bytes. It issues pre-signed S3 multipart upload URLs to the client; the client uploads each chunk directly to S3, bypassing the API server entirely. Once the client calls /upload/complete with the per-chunk ETags, the API calls S3's CompleteMultipartUpload and publishes a transcoding job to the queue. This keeps the Upload API stateless and prevents it from becoming a bandwidth bottleneck, S3 absorbs all the byte transfer load.

Object Store holds two kinds of data: raw uploads awaiting transcoding, and the encoded segment files that are the actual playback artefacts. These are structurally separate prefixes within the same store. The raw bucket is ephemeral (files may be lifecycle-deleted after 30 days); the segments bucket is permanent and must be globally durable.

Message Queue (a durable log like Kafka) decouples the upload API from the transcoding workers. Without this, upload latency becomes coupled to transcoding capacity, a transcoding backlog would block new uploads. The queue also provides retry semantics: if a transcoding job fails, the message is not acknowledged and gets re-delivered.

Transcoding Workers consume jobs from the queue, pull raw video from the object store, run FFmpeg (or a custom encoder), and write the output segments back to the object store. CPU-bound and embarrassingly parallel, during peak upload hours you scale out the fleet; overnight you scale it back. Workers are stateless, so there's no coordination overhead.

Metadata DB stores video metadata (title, description, uploader, duration, manifest URL, processing status). This is a relational workload, a SQL database (PostgreSQL) is appropriate. The metadata record is created at upload time with status processing and updated to ready once transcoding completes.

CDN Edge (a content delivery network like Cloudflare or Akamai) serves video segments from points of presence near the viewer. The CDN pulls segments from the object store on first request and caches them at the edge. For popular videos, 95%+ of requests are served entirely from edge cache without touching origin.

API Gateway handles the viewer's two non-video requests: fetching video metadata and fetching the manifest file that describes available qualities and segment URLs. These are tiny JSON payloads, the gateway routes to the metadata service, which hits the database (or cache) and returns within milliseconds.

Architectural rationale

Separate upload and playback API surfacesisolation›

Upload is write-heavy, latency-tolerant, and intermittently bursty (viral event → sudden upload spike). Playback is read-heavy, latency-critical, and needs 99.99% availability. Running them as separate services means a transcoding backlog or upload service incident cannot degrade streaming. This directly addresses the asymmetric NFRs from §2.

TradeoffTwo API surfaces to operate, but the failure isolation is worth it. Uploads can degrade gracefully; playback must not.

Alternativeunified gateway with upload/stream routes, simpler to start with, but creates coupling at the wrong layer.

Async transcoding via message queuedecoupling›

Transcoding a 1-hour video takes minutes even on fast hardware, far too slow to do synchronously in an upload request. The queue (Kafka) acts as a buffer: upload completes instantly, transcoding happens when workers are available. The queue also provides exactly-once delivery semantics using consumer group offsets, preventing duplicate processing.

TradeoffAdds operational complexity (Kafka cluster). A simpler alternative for early stage is a job queue backed by PostgreSQL (advisory locks) or Redis (RPOPLPUSH). Kafka becomes necessary at multi-thousand jobs/minute scale.

CDN-first delivery, segments never served from origincost + latency›

Video segments are immutable once encoded, a given segment URL will always return the same bytes. This is ideal for CDN caching: a segment cached at an edge node serves unlimited viewers without touching origin. Without CDN, a single viral video could saturate an entire datacenter's egress. The CDN doesn't just improve latency, it makes the unit economics of video streaming viable.

TradeoffCDN egress is still the largest cost line. At scale, platforms push this further by co-locating with ISPs (Open Connect for Netflix) to serve from within the ISP's network, eliminating CDN cost entirely for top-tier traffic.

Real-world comparison

Decision	This design	YouTube	Netflix
Streaming protocol	HLS or MPEG-DASH	MPEG-DASH (custom)	MPEG-DASH
Encoding	FFmpeg workers on Kafka jobs	Spanner-coordinated transcoding	Cosmos (distributed encoding)
CDN delivery	Third-party CDN (Cloudflare/Akamai)	Google's own CDN infrastructure	Open Connect (ISP-embedded)
Primary codec	H.264 (widest compat)	VP9 / AV1	H.264 / H.265 / AV1
Metadata store	PostgreSQL	Bigtable + Spanner	Cassandra + MySQL

💡

Every major platform eventually builds custom infrastructure around CDN delivery because egress is the dominant cost. The decisions cascade from codec (AV1 compresses ~30% better than H.264, which translates directly to egress savings at scale) through to delivery architecture. YouTube and Netflix chose differently from each other, and both chose differently from this design, because their scale, device targets, and cost structures differ.

Adaptive bitrate streaming

Adaptive bitrate (ABR) streaming is a video delivery technique where the source video is cut into short segments (typically 4–6 seconds) and encoded at multiple quality levels — for example 360p, 720p, 1080p, and 4K. A manifest file (HLS .m3u8 or DASH .mpd) lists all available quality variants and their segment URLs. The client player measures current network bandwidth and buffer fill, then requests each segment at the highest quality it can sustain without rebuffering. The server’s job is to make all versions available; the adaptation logic lives entirely in the player.

Most distributed systems serve one canonical response to a request. Video streaming serves dozens of versions of the same content, the client picks which one to fetch next, segment by segment, based on what its network can handle right now. The server's job is to make all the versions available; the adaptation logic lives entirely in the player.

The mechanism: cut the video into short segments (4–6 seconds each), encode each segment at multiple quality levels, and publish a manifest listing all the options. The client downloads the next segment at whatever quality its measured throughput can sustain. Because it's only committing 4–6 seconds ahead, it can step down quality quickly when the network degrades, without making the viewer wait for a full rebuffer.

A single source video produces 5 quality variants. The manifest lists all variants; the ABR client picks which to request, segment by segment, based on measured bandwidth and buffer state.

The two standard protocols: HLS and MPEG-DASH

	HLS (HTTP Live Streaming)	MPEG-DASH
Origin	Apple (2009)	MPEG consortium (2012)
Manifest format	`.m3u8` (plain text)	`.mpd` (XML)
Segment format	MPEG-TS or fMP4	fMP4 (fragmented)
Native device support	All Apple; most modern browsers	Android, Smart TVs, browsers (via JS library)
Codec flexibility	Limited, H.264/H.265 primary	Any codec (AV1, VP9, H.265)
Royalties	Apple patents	Royalty-free

Our choice for this design is HLS with fMP4 segments. HLS offers the broadest device compatibility (including all Apple devices) and fMP4 segments are compatible with most DASH players, giving us the option to serve both protocols from the same segment files. For interview purposes, either choice is fine, the important thing is explaining the tradeoff.

ABR algorithm internalsadvanced›

The ABR algorithm runs on the client and makes a quality decision before fetching each segment. The simplest algorithm (throughput-based) measures the download speed of the last N segments and picks the highest bitrate that fits within 80% of that speed. More sophisticated algorithms (BOLA, MPC) consider buffer occupancy: if the buffer is draining fast, step down quality; if the buffer is full, it's safe to step up. Netflix's Pensieve (2017) used reinforcement learning to train per-network-condition ABR policies, an example of where L7-level thinking applies.

Interview signalKnowing that the ABR algorithm is a client-side concern, not a server-side one, is a useful differentiator. The server just needs to make all quality levels available; the intelligence of adaptation lives in the player.

🎯

Common follow-up: "What segment length should we use?" Shorter segments (2s) give faster adaptation but more HTTP overhead. Longer segments (10s) are more efficient but adapt slowly to network changes. Most platforms converge on 4–6 seconds as the sweet spot. Segment length is fixed at encoding time and encoded into the manifest.

05b

API design

POST /videos/upload/init, Initiate chunked upload

// Request
{
  "filename": "vacation.mp4",
  "file_size_bytes": 2147483648,  // 2 GB
  "content_type": "video/mp4",
  "title": "Summer Vacation 2025",
  "description": "..."
}

// Response 201 Created
{
  "upload_id": "upl_8f3k2j9...",
  "video_id": "vid_4x2k9...",
  "chunk_urls": [           // pre-signed S3 multipart URLs
    { "part_number": 1, "url": "https://s3.../..." },
    // ...
  ],
  "chunk_size_bytes": 5242880  // 5 MB chunks
}

The upload API returns pre-signed S3 multipart upload URLs. The client uploads each chunk directly to S3 , the upload API server never touches the video bytes, only the metadata. This prevents the API server from becoming a bandwidth bottleneck and offloads the heavy lifting to S3's durable storage layer.

🔒

Auth and abuse prevention (don't skip this in an interview): The upload init endpoint must require authentication (OAuth token or API key), unauthenticated upload would allow anyone to fill your object store. Rate-limit per creator (e.g. 10 uploads/hour) to prevent abuse. Validate content_type server-side before issuing pre-signed URLs, only issue URLs for video MIME types. File type validation beyond MIME type (magic bytes check) happens after the upload lands in the raw bucket, before the transcoding job is enqueued.

POST /videos/upload/complete, Finalize and trigger transcoding

// Request
{
  "upload_id": "upl_8f3k2j9...",
  "parts": [
    { "part_number": 1, "etag": "\"abc123\"" },
    // ...
  ]
}

// Response 202 Accepted
{
  "video_id": "vid_4x2k9...",
  "status": "processing",
  "estimated_ready_at": "2025-06-01T12:05:00Z"
}

GET /videos/:id, Fetch video metadata and manifest

// Response 200 OK
{
  "video_id": "vid_4x2k9...",
  "title": "Summer Vacation 2025",
  "uploader": { "id": "usr_...", "name": "Alice" },
  "duration_seconds": 342,
  "view_count": 18420,
  "status": "ready",           // or "processing" | "failed"
  "manifest_url": "https://cdn.example.com/videos/vid_4x2k9/master.m3u8",
  "thumbnail_url": "https://cdn.example.com/thumbs/vid_4x2k9.jpg"
}

Endpoint	Level	Notes
DELETE /videos/:id	L5	Soft delete, mark metadata as deleted, CDN cache TTL expires naturally. Hard delete from object store is async.
GET /videos/:id/captions	L5	Returns available caption tracks (WebVTT format). Auto-generated via async ASR pipeline.
POST /videos/:id/views	L5	Increments view counter. Idempotent per session via session token dedupe in Redis.
GET /videos/:id/analytics	L7/L8	Per-segment watch-time heatmap. Requires separate analytics pipeline, a columnar store (ClickHouse or BigQuery).

🖼️

Thumbnail generation: Thumbnails are extracted as a lightweight step inside the transcoding pipeline. After the first pass of encoding completes, FFmpeg extracts a frame at the 10% mark of the video duration and stores it to the object store (e.g. thumbs/{video_id}.jpg). The metadata record's thumbnail_url is populated at the same time the manifest_url is set, so both become available atomically when status transitions to ready.

Core flows, upload & playback

Upload flow

The latency NFR for uploads is relaxed, users expect processing to take minutes, not milliseconds. The primary requirements are reliability (no data loss) and resumability (a dropped connection should not require re-uploading 2 GB). These requirements together drive the multipart upload pattern, where each chunk is independently acknowledged before the next is sent.

Client uploads chunks directly to S3, bypassing the Upload API for byte transfer. API only handles metadata and orchestration. Transcoding is triggered asynchronously.

Playback flow

The playback latency NFR (<2s to first frame, from §2) drives the key design decision here: the manifest and first segment must be available from CDN edge before the player starts buffering. The manifest file is tiny (~1 KB) but is fetched on every play, it goes through the API gateway, which checks metadata DB for the manifest URL and returns it. The segments themselves are fetched directly by the player from the CDN edge URL embedded in the manifest.

💡

Why this matters for latency: The manifest round-trip through the API gateway adds ~50–100ms latency (fast, cached). If the first CDN edge segment is already cached, the player can start rendering within 200–400ms of playback start. If it needs to fetch from origin (cold cache), add another 100–500ms depending on geography. The CDN cache hit rate for popular content is typically >95%, making cold-cache paths rare in practice.

Data model

The video platform has three distinct data domains with very different access patterns. The access patterns are worth looking at first, they're what determine the schema, not the other way around.

Operation	Frequency	Query shape
Fetch video metadata by ID	Very high (every playback)	Point lookup by `video_id`
List videos by uploader	High (channel page)	Range scan by `uploader_id`
Increment view count	Very high (every play event)	Counter increment by `video_id`
Update processing status	Low (once per transcoding job)	Point update by `video_id`
Fetch user profile	High	Point lookup by `user_id`

Two things stand out from this table. First, the dominant read pattern is a point lookup by video_id, this is extremely cache-friendly and drives the Redis metadata cache in §8. Second, view count increments are so frequent that they cannot go directly to the primary database without creating a hot-spot on popular video rows; they need their own buffered write path.

Schema

-- Core video record
CREATE TABLE videos (
  video_id      VARCHAR(16)  PRIMARY KEY,
  uploader_id   VARCHAR(16)  NOT NULL REFERENCES users(user_id),
  title         TEXT         NOT NULL,
  description   TEXT,
  duration_sec  INTEGER,
  status        VARCHAR(16)  NOT NULL DEFAULT 'processing',
                             -- 'processing' | 'ready' | 'failed'
  manifest_url  TEXT,        -- null until transcoding completes
  thumbnail_url TEXT,
  created_at    TIMESTAMPTZ  NOT NULL DEFAULT NOW()
);

-- View counts live separately, write-heavy, eventually consistent
-- Managed by counter service; flushed here periodically from Redis
CREATE TABLE video_stats (
  video_id      VARCHAR(16)  PRIMARY KEY REFERENCES videos(video_id),
  view_count    BIGINT       NOT NULL DEFAULT 0,
  like_count    BIGINT       NOT NULL DEFAULT 0,
  updated_at    TIMESTAMPTZ  NOT NULL DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_videos_uploader ON videos(uploader_id, created_at DESC);
CREATE INDEX idx_videos_status   ON videos(status) WHERE status = 'processing';

video_stats as a separate tablehot-spot prevention›

Popular videos can receive thousands of view events per second. If view_count lived as a column in the videos table, every increment would lock that row, creating severe write contention. Separating it into video_stats doesn't solve the hot-row problem alone, but combined with the Redis buffer pattern in §8, it means the database only receives periodic bulk flushes rather than per-event increments. The updated_at column lets us confirm flush recency.

TradeoffReads that need both metadata and view count must join two tables. In practice, the metadata service fetches from Redis (which holds both), so this join is rare, it only occurs on cache miss.

status field on the videos tablepolling simplicity›

After upload, the client can poll GET /videos/:id to check when status transitions from processing to ready. This is simpler than setting up a websocket or webhook for most upload clients. The partial index on (status) WHERE status = 'processing' makes it efficient to query all in-flight jobs from the monitoring dashboard without a full table scan.

Caching strategy

Video streaming uses three distinct caching layers: a CDN edge cache (stores immutable video segments close to viewers, TTL 7 days), a Redis metadata cache (stores serialized video JSON per video_id, handles ~1M QPS, TTL 10 minutes), and a Redis counter buffer (accumulates view and like increments in memory, flushed to PostgreSQL every 60 seconds). The CDN layer is what makes the economics of video streaming viable — at peak, 95%+ of segment requests are served from edge cache without touching origin.

A video uploaded once might be watched ten million times. That read/write ratio shapes every caching decision, and at this scale, caching isn't optional, it's what makes the cost model work at all. There are three distinct layers, each addressing a different part of the request path.

Three caching layers: CDN edge for video bytes, Redis for metadata JSON, Redis for write-buffered view counters. Each layer is positioned in the §4 request path.

Layer	What it caches	Why it exists	Invalidation
① CDN Edge	Video segments (.ts / .m4s), thumbnails, manifests	Latency NFR from §2: <2s to first frame requires edge proximity. Also the only way to serve massive concurrent viewers economically.	Segments are content-addressed and immutable, TTL of 7 days, no explicit invalidation needed. Manifests TTL 5 min (status may change).
② Redis Metadata	Serialised video metadata JSON per `video_id`	Every playback triggers a metadata fetch. DB can handle ~10K QPS; Redis handles ~1M QPS , essential at scale.	Explicit invalidation on any write: title edits, description changes, and, most critically , when `status` transitions from `processing` to `ready` (this is when `manifest_url` becomes non-null and must be visible to viewers immediately). TTL of 10 min as safety net. Write-through on update.
③ Redis Counter	In-memory atomic counters: `views:{video_id}`, `likes:{video_id}`	View counts are write-heavy, direct DB increments at scale cause hot-row lock contention. Redis `INCR` is O(1) and lock-free.	Not a cache, it's the write buffer. A background job periodically flushes deltas to PostgreSQL using `UPDATE video_stats SET view_count = view_count + $delta`.

Counter flush strategy and accuracyL5+›

The flush job runs every 60 seconds per the eventual consistency NFR from §2. It captures each counter's current value and resets it to 0 atomically, then writes the delta to PostgreSQL. GETSET was deprecated in Redis 6.2 and removed in 7.0, the correct approach is a Lua script, which Redis executes atomically:

-- Atomically capture and reset a counter.
-- Returns the previous value as a string, or nil if the key didn't exist
-- (i.e. no views since the last flush). Caller must handle nil.
local val = redis.call('GET', KEYS[1])
if val then redis.call('SET', KEYS[1], '0') end
return val  -- nil means delta = 0; skip the DB write

The nil return matters: if a video received no views since the last flush, the key won't exist and the script returns nil. A flush job that doesn't handle nil will throw a null-pointer exception for every cold video in the counter keyspace. The fix is simple , skip the DB write when val is nil. A Redis failure mid-flush could still cause a double-count on the next flush (counter reset but DB write not committed); the eventual consistency NFR explicitly permits this, and the magnitude is bounded by 60 seconds of view events.

At YouTube scaleView counting moves to a dedicated pipeline: events go through a durable log (Kafka) → a stream processing engine (Apache Flink or Google Dataflow) for aggregation → periodic bulk upsert to the stats store. This handles deduplication (same user watching repeatedly), bot filtering, and replay analysis, none of which are possible in a simple Redis INCR.

Deep-dive scalability

The following scalability problems don't all arise at the same time. A service at 1M daily users looks different from one at 100M. These are the problems that emerge as you scale, roughly in the order they bite you.

Transcoding pipeline parallelismL5+›

A naive transcoding worker processes a video sequentially, segment by segment, quality by quality. YouTube processes roughly 500 hours of video per minute, note the estimator defaults to 500 hours per day, about 0.6% of YouTube's actual ingest volume, intentionally scoped to a large-but-not-hyperscaler service. Even at that lower scale, queue backlog is a real concern during upload spikes. The solution is to split work at two levels: across videos (multiple workers consume from Kafka concurrently) and within a single video (segment-level parallelism using a DAG scheduler). Each 4-second chunk can be encoded independently, a 1-hour video has ~900 chunks, each encodeable in parallel.

Platforms like Netflix have built internal distributed encoding platforms (Netflix's internal system, called Cosmos) where encoding is broken into thousands of micro-tasks dispatched to a serverless fleet. A 2-hour movie might run on 10,000 concurrent compute nodes to complete encoding in minutes rather than hours.

Practical starting pointA Kafka consumer group with N workers per quality tier, using spot/preemptible VMs for cost. Each worker handles one video at a time, encoding all qualities sequentially. Add segment-level parallelism only when queue depth consistently exceeds target latency.

Metadata DB shardingL5+›

A single PostgreSQL instance handles roughly 10K–20K QPS on reads with good indexing and hardware. At 500M DAU each triggering 1–2 metadata reads, peak QPS exceeds this comfortably , even with Redis shielding most requests, cache misses and writes accumulate. The natural shard key is video_id (a UUID or hash-based ID), which distributes video rows uniformly. A consistent hashing ring with 8–16 shards handles YouTube-scale metadata workloads. Cross-shard queries (e.g. aggregates across all videos) are rare and can run on read replicas with eventual consistency.

AlternativeCassandra or Bigtable as the metadata store, naturally distributed, good for high-QPS point lookups. Tradeoff: no joins, no transactions, operational complexity. At 100M+ DAU this becomes the right choice. Below that, sharded PostgreSQL is operationally simpler.

CDN cache warming for new videosL5+›

When a new video goes live, its segments are not cached anywhere, every early viewer hits CDN origin (the object store), which is both slower and more expensive. For ordinary content, this is fine: the CDN populates its cache organically as early viewers watch. For a high-profile release (major channel launch, live event VOD), a thundering herd of simultaneous viewers can overwhelm origin before the cache warms. The solution is proactive cache warming: once transcoding completes, a background job pre-fetches all segments at all quality levels to the CDN, establishing cache entries before the video is publicly visible. Most CDNs support an API for this.

Cost considerationCache warming incurs CDN transfer fees on every edge node you warm. For non-viral content, this is wasteful, the organic approach is more cost-efficient. Target warming at content expected to receive >100K views in the first hour.

Multi-region storage and failoverL6+›

Object stores like S3 are single-region by default. A region outage would make all segments in that region inaccessible. The durability NFR (11 nines) is about disk failure, not region failure, for the streaming availability NFR (99.99%), cross-region replication is required. Options: S3 Cross-Region Replication (asynchronous, minutes of lag), a natively multi-region object store (Cloudflare R2 distributes data across regions by default), or running dual S3 buckets in separate regions with application-level write fan-out on upload. The CDN's pull-on-miss should be configured to fail over to the secondary origin if the primary returns an error.

Storage tiering and cost optimizationL7+›

The long tail of video content follows a power law: roughly 20% of videos account for 80% of views. The bottom 50% of videos may never be watched again. Storing all of them in hot standard storage is expensive. Storage lifecycle policies can move videos older than 90 days with fewer than 100 views to cold storage (S3 Glacier, Nearline) at 70–90% cost reduction. Retrieval takes minutes, but for a video nobody has watched in months, a brief delay on re-access is acceptable. The most extreme approach, taken by some platforms, is to delete segments for content below a certain engagement threshold and only retain the raw master file, re-transcoding on demand if someone requests playback.

Failure modes & edge cases

Scenario	Problem	Solution	Level
Upload connection drops mid-chunk	Partial chunk uploaded; client cannot determine which chunks succeeded	Multipart upload with per-chunk ETags: client tracks acknowledged ETags and resumes from the last unacknowledged chunk. S3 multipart upload natively supports this.	L4
Transcoding worker crashes mid-job	Partially encoded segments written; Kafka message not acknowledged; job will be re-delivered	Workers write segments to a temp prefix and atomically move to the final prefix only on job completion. Re-delivery produces a clean re-run. Idempotent job design is essential.	L5
Transcoding queue backlog spike	New uploads process slowly; creator sees "processing" for hours	Auto-scale transcoding worker fleet based on Kafka consumer group lag. Use spot VMs with graceful preemption handling (checkpoint current segment before shutdown). Set a max-processing-time alarm.	L5
CDN edge node returns stale manifest	Viewer gets a manifest pointing to segments that don't exist yet (processing still ongoing)	Manifest is not pushed to CDN until all segments are encoded and confirmed in object store. Status field in metadata prevents manifest_url from being returned while status is 'processing'.	L5
Redis counter service outage	View count increments are lost during outage window	View counts are eventually consistent by NFR, some loss is acceptable in steady state. On Redis recovery, counts resume from the last flushed value. For critical accuracy, write view events to a durable Kafka topic as the source of truth, with Redis as a derived projection.	L5
Viral video: CDN thundering herd on publish	Millions of simultaneous viewers hit CDN origin before cache warms, overwhelming the object store	Proactive cache warming (§9): push all segments to CDN edge before making video public. For breaking events, delay public release by 30s to allow warming to complete.	L6
Object store region outage	All video segments in that region become inaccessible; streaming fails globally	Cross-region replication for segments bucket. CDN configured to fail over to secondary origin. RTOs measured in minutes for cached content (already at edge); uncached content RTOs depend on replication lag.	L6
A/B testing different encoding parameters	Changing segment length, bitrate ladder, or codec affects all existing manifests	Version manifests: store manifests at versioned paths (e.g. `/v2/master.m3u8`). New uploads use v2; old content retains v1. Backfill re-encoding as a background job, gated by cost analysis.	L7/L8

How to answer by level

L3/L4Working end-to-end system›

What good looks like

Identifies upload and playback as separate paths
Mentions object storage for video files
Knows videos must be transcoded to different qualities
Mentions CDN for delivery
Can sketch a basic schema with video metadata
File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication

Common gaps at L3/L4

Treats upload and streaming as the same API surface
No mention of adaptive bitrate, just "different quality buttons"
Synchronous transcoding during upload request
Serving video directly from app servers, not CDN
File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication

L5Tradeoffs and pipeline design›

What good looks like

Explains ABR: segments, manifest, client-side quality selection
Designs the async transcoding pipeline with Kafka decoupling
Identifies view count as a separate write-heavy concern
Knows the CDN caching model (pull-on-miss, TTL)
Can discuss HLS vs MPEG-DASH tradeoffs
File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication

Separates L5 from L6

Owns the encoding pipeline end-to-end, including failure handling
Addresses hot-row problem on view counts with Redis buffer
Considers multipart upload resumability
Discusses CDN cache warming for viral content
File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication

L6Platform-level ownership›

What good looks like

Designs segment-level parallel encoding for large videos
Addresses cross-region storage replication for availability NFR
Talks about metadata DB sharding strategy
Proactive CDN cache warming for new popular content (an L5 topic; L6 goes deeper , CDN API integration, cost-gating logic, and warming only above a traffic threshold)
Views the system as a product with creator and viewer personas
File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication

Separates L6 from L7

Reasons about codec selection (AV1 vs H.264) and its cost implications
Addresses manifest versioning and encoding parameter changes
Considers ISP-level CDN co-location economics
File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication

L7/L8Cost, codec, and platform strategy›

What good looks like

Drives codec decisions from unit economics: AV1 = 30-50% smaller files = massive egress savings at scale
Considers storage lifecycle: hot/cold tiering by engagement
Reasons about build vs. buy for CDN (Open Connect equivalent)
Designs the ABR algorithm strategy, not just the infrastructure
Addresses the analytics pipeline as a first-class concern
File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication

Distinguishing behaviours

Connects every architectural decision back to its cost or reliability implication
Questions the requirements before designing (live vs. VOD has very different architecture)
Thinks about creator tooling and the upload experience as a competitive differentiator
File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication

Classic interview probes

Question	L3/L4	L5/L6	L7/L8
"Why do we use adaptive bitrate instead of just picking a quality upfront?"	Network conditions change during playback, a fixed quality would cause buffering on speed drops.	Explains the segment model and manifest, client-side throughput measurement, and buffer health as inputs to the ABR algorithm.	Discusses per-title encoding (encoding ladder customised per video's visual complexity) and per-device ABR policy differences (mobile vs. smart TV).
"How would you handle a video that gets 10 million views in the first 5 minutes?"	CDN handles it, lots of servers serving the file.	CDN cache warming before publish, thundering herd prevention, and origin shield (CDN-to-origin fan-in reduction layer).	Pre-positioning strategy, co-location with major ISPs, and cost modeling for egress at 10M concurrent 1080p viewers (~40 Tbps).
"The transcoding queue is backed up, new uploads take 2 hours to become available. What do you do?"	Add more servers / workers to the transcoding fleet.	Auto-scale on Kafka consumer group lag, spot VMs with checkpoint-based preemption handling, and segment-level parallelism within a single video job.	Rethinks the encoding DAG, prioritizes short videos over long ones in the queue, and considers streaming transcoding (serve segments as they complete rather than waiting for the whole video).
"How would you add support for live streaming?"	Live streaming needs a different architecture, segments must be published in near real-time as the stream is captured, not after the full upload completes. An ingest server segments the incoming stream on-the-fly and pushes each segment to the CDN as it's ready.	Distinguishes live vs. VOD: live requires real-time segmentation and immediate CDN push, very low segment TTL (2–4s), and a different latency target (LL-HLS or WebRTC for low latency).	Addresses DVR (time-shift buffer), adaptive ingest from unstable encoder connections, multi-CDN failover for live events, and the architectural tension between latency and reliability in live delivery.

How the pieces connect

Every architectural decision traces back to a requirement or a number. These are the key threads.

Playback latency NFR < 2s (§2) → video must be available at edge before viewer requests it → CDN edge caching with pull-on-miss + proactive warming for popular content (§4, §8, §9)

Transcoding takes minutes (§3 estimation) → upload must not block on encoding → async decoupling via Kafka message queue (§4), upload returns 202 Accepted immediately

View count eventual consistency acceptable (§2 NFR) → direct DB increments create hot-row lock contention at scale → Redis INCR buffer with periodic flush to PostgreSQL (§7, §8)

Streaming availability 99.99% vs upload 99.9% (§2 NFR) → upload and streaming must be failure-isolated → separate API surfaces; a transcoding backlog cannot degrade active playback (§4)

Variable viewer bandwidth (real-world constraint) → fixed quality causes buffering on slow networks → ABR streaming: segment encoding at multiple bitrates, client-side quality selection per-segment (§5)

Massive read/write ratio (§3: ~500M viewers, ~500h/day uploads) → metadata DB cannot absorb every playback as a read → Redis metadata cache with write-through invalidation; DB only on cache miss (§8)

Also in this series

Rate Limiter System Design, atomic Redis operations, distributed race conditions, and multi-tier quota enforcement
URL Shortener System Design, hash encoding tradeoffs, database sharding strategies, and viral key mitigation
Web Crawler System Design, Bloom filter deduplication, politeness throttling, and distributed frontier design
Twitter/X Feed System Design, fan-out write amplification, hybrid push/pull strategy, and celebrity threshold design
Notification Service System Design, multi-channel delivery, idempotency keys, and priority queues at scale
Search Autocomplete System Design, Trie data structures, prefix caching, and read-heavy scale strategies
Key-Value Store System Design, Consistent hashing, quorum consensus, and SSTable fundamentals
Chat System (WhatsApp) System Design, WebSocket management, transient vs persistent storage, and read receipts
Distributed Message Queue System Design, Kafka partition tuning, exactly-once delivery, and geo-replication
File Storage (Dropbox / Google Drive) System Design, chunking, delta sync, conflict resolution, and global deduplication
Ride-Sharing System Design (Uber / Lyft) — geohashing, WebSocket-driven location tracking, and ETA prediction
Payment Processing System Design — idempotency keys, exactly-once semantics, and append-only ledger models
Top-K Leaderboard System Design — Redis sorted sets, approximate counting, and stream aggregation
Airbnb Booking & Reservation System — inventory locks, double-booking prevention, and async elasticsearch sync
Photo-Sharing Feed System Design — image pipelines, CDN delivery, and social graph scaling
Proximity Search System Design (Yelp / Google Places) — geohash indexing, quadtree partitioning, and Bayesian review ranking
Online Judge System Design — secure sandboxing, execution queues, and worker scaling

YouTube System Design Interview Guide

What the interviewer is testing

Requirements clarification

Functional requirements

Non-functional requirements

Capacity estimation

Interactive Capacity Estimator

High-level architecture

Architectural rationale

Real-world comparison

Adaptive bitrate streaming

The two standard protocols: HLS and MPEG-DASH

API design

POST /videos/upload/init, Initiate chunked upload

POST /videos/upload/complete, Finalize and trigger transcoding

GET /videos/:id, Fetch video metadata and manifest

Core flows, upload & playback

Upload flow

Playback flow

Data model

Schema

Caching strategy

Deep-dive scalability

Failure modes & edge cases

How to answer by level

Classic interview probes

How the pieces connect

System Design Mock Interviews

Practice Coding Interviews Now