URL Shortener System Design — FAANG Interview Guide (L3

01

What the interviewer is testing

The URL shortener is deceptively simple to describe and genuinely hard to scale. That gap is exactly why it's the canonical system design question. It forces you to reason about capacity, caching, encoding, sharding, and analytics pipelines all at once, in a system everyone already understands.

What changes across levels isn't the system, it's the frame. At L3/L4 the question is whether you can arrive at a working design independently. At L5/L6 it's whether you understand the tradeoffs in your own choices before being asked. At L7/L8 it's whether you can challenge the problem definition itself and drive the conversation rather than respond to it.

💬

The probes that separate levels. The same four questions appear at every level, but the expected answer is completely different each time:

"What if the same URL is submitted twice?"
"How would you handle a URL that goes viral?"
"How do you expire links without a full table scan?"
"Walk me through a cache miss."

See Section 11 for exactly how the expected answer shifts from L3/L4 through L7/L8.

02

Requirements clarification

The first few minutes of a system design interview are the highest-leverage time you have. The requirements you establish here, and the ones you explicitly defer, shape every tradeoff you'll make for the rest of the conversation.

Functional requirements

These are the things the system does. Two are non-negotiable; the rest are worth raising as optional to signal breadth.

Requirement	Notes
Shorten a long URL to a 7–8 character code	Core write path
Redirect a short URL to the original	Core read path, must be fast
Custom aliases	Optional; mention at L4+
URL expiration	Optional; important for cleanup design
Analytics (click counts, geo, referrer)	Drives the 301 vs 302 decision later

Non-functional requirements

Property	Target
Availability	99.99%, redirect failures are customer-visible
Redirect latency	<10 ms p99 on cache hit; <50 ms p99 on cache miss
Durability	No silent loss of URL mappings
Analytics consistency	Eventually consistent. An async pipeline is fine.
Scale	1 M new URLs/day; 100:1 read/write ratio

NFRs are where most of the interesting design tension lives. Each target above is a deliberate choice. Expand any of them to understand the reasoning behind it, and what it directly drives in the architecture.

Availability: 99.99% Asymmetric across read and write ›

99.99% means roughly 52 minutes of downtime per year. The justification is asymmetric: a redirect failure is immediately visible to an end user, meaning they clicked a link and got nothing. A creation failure is far less damaging; users can retry, or the failure is invisible entirely (a background process that tries again).

This distinction matters for the architecture. It means the read path needs more redundancy than the write path. Multiple read replicas, a warm cache, and a load balancer with fast health checks are all direct responses to this requirement. The write path, which at 11.5 QPS is barely a whisper, can tolerate a brief primary failover without meaningful user impact.

What this drives Read replicas on the DB · Multi-AZ Redis · Load balancer health checks · Cache as a fallback buffer during DB degradation

Redirect latency: <10 ms p99 on cache hit The number that justifies the entire cache layer ›

10 ms is aggressive. A round-trip to a database in the same data centre is typically 1–5 ms under ideal conditions, but p99 latency accounts for the worst 1% of requests: slow queries, lock contention, garbage collection pauses. Hitting the DB on every redirect makes it very hard to stay under 10 ms at the 99th percentile.

Redis, by contrast, responds in under 1 ms for a simple key lookup. That headroom (1 ms for the cache lookup, plus network overhead) is why the <10 ms target effectively mandates a cache. Without it, the target is aspirational rather than achievable.

The <50 ms target on a cache miss is more relaxed by design. A miss means a DB read, which is slower but still expected to be rare (80%+ of requests should hit the cache). Setting a separate, looser target for misses is honest engineering, you're acknowledging the two paths have different cost profiles rather than pretending they're the same.

What this drives Redis as a mandatory layer, not optional · L1 local cache on API servers for ultra-hot keys · Cache-aside read pattern (check cache first, always)

Durability: no silent loss of URL mappings Asymmetric with analytics ›

A lost URL mapping is catastrophic in a way that a lost click event is not. If short.ly/abc123 disappears from the database, every future user who clicks that link gets a 404. There is no recovery. The original long URL is gone. This is why URL mappings must be stored in a durable, replicated database with write-ahead logging, not in Redis (which is a cache, not a database, despite being able to persist data).

The framing "no silent loss" is deliberate. It means a write must not return success unless the data is safely persisted. An acknowledged write that later disappears due to async replication lag on a primary failure is the failure mode to design against. This points toward synchronous replication or quorum writes for the URL mapping table.

What this drives Relational DB (not Redis) as the source of truth · Synchronous replication or quorum writes · WAL-enabled storage · URLs ≠ cache entries, the DB is not optional

Analytics consistency: eventually consistent is fine A deliberate relaxation ›

This is an explicit relaxation, and stating it out loud is just as important as stating the strict requirements. Not every part of the system needs the same guarantees. If a dashboard shows 9,842 clicks instead of 9,847 because five recent events are still in the Kafka pipeline, no user is harmed. The number will catch up within seconds.

This relaxation is what makes the async analytics architecture acceptable. If click counts needed to be exact and real-time, you'd need synchronous writes on the redirect path, which would blow the latency budget. By establishing eventual consistency as acceptable here, you unlock the entire async pipeline design: Kafka → Flink → ClickHouse, with the redirect path never waiting for any of it.

The key skill is knowing which parts of a system can tolerate relaxed consistency. URL mappings cannot. Click counts can. Stating that distinction explicitly and tracing it back to user impact is what separates a senior answer from a junior one.

What this drives Kafka publish on redirect (fire-and-forget) · Async consumer pipeline · ClickHouse as the analytics store, not the URL store · Dashboard lag of seconds is acceptable

Scale: 1M URLs/day, 100:1 read:write An anchor, not a ceiling ›

Treating scale as a fixed assumption rather than a range is a common mistake. 1M URLs/day and 100:1 is a reasonable starting anchor. It's the kind of number you'd get from a mid-size product. But the interviewer may nudge you toward 100M URLs/day, or toward a single viral event generating 50,000 QPS for 10 minutes. The architecture should be defensible across that range, not brittle at the edges.

The read:write ratio is the more structurally important number. At 10:1, you'd still care about performance but might question whether a cache layer is worth its complexity. At 100:1, the cache is obvious. At 1000:1 (a single viral URL), the cache becomes the only thing standing between you and a DB meltdown. Naming this ratio early anchors every caching discussion that follows.

A useful pattern: establish the baseline (1M/day, 100:1), then proactively mention how the design degrades and recovers at 10× scale. This shows you've thought beyond the happy path without needing the interviewer to push you there.

What this drives Cache as a primary, not auxiliary, component · Stateless API servers for horizontal scaling · Read replicas before sharding · Hot-key protection (L1 local cache) for viral spikes

✓

Out of scope, say it clearly: user authentication, billing, abuse detection, malware scanning. Mention you'd address these given more time. Not raising them at all is a signal that you haven't thought about the system's full surface area.

03

Capacity estimation

The point of capacity estimation isn't to get a precise number. It's to understand the character of the system before you design it. A few rough calculations will tell you whether you're dealing with a read-heavy or write-heavy workload, whether storage fits on one machine or needs sharding, and how big the cache needs to be to actually help. Those answers shape every decision that follows.

There are four dimensions worth thinking through: traffic, storage, bandwidth, and cache. Expand each one to see the reasoning behind it.

Traffic: QPS and the read/write split The most revealing number ›

Start with what you can anchor to: how many new URLs are created per day? 1 million is a reasonable starting assumption for a mid-scale system. Converting to per-second is a one-step calculation: 1M ÷ 86,400 ≈ 11.5 writes/sec. That's modest. A single DB node handles this without breaking a sweat.

The more interesting number is reads. Ask yourself: for every URL created, how many times is it clicked? For a general-purpose shortener, 100:1 is a reasonable ratio. A marketing campaign URL might be 10,000:1. That gives you 1,150 reads/sec on average, but traffic is never flat. Apply a 5× peak multiplier for spikes (a tweet goes viral, a campaign launches), which puts peak reads at ~5,750/sec.

Why this matters: at 100:1, this system is almost entirely reads. That single insight justifies the entire caching strategy. If reads were only 2:1, you'd think about the design very differently.

Numbers to remember 1M URLs/day → 11.5 write QPS · 1,150 read QPS avg · 5,750 read QPS peak

Storage: does this fit on one machine? Usually less scary than it sounds ›

Each URL row needs: the short code (~8 bytes), the long URL (average ~200 bytes for a real URL, but allow 500 bytes including metadata columns). At 1M rows/day for 5 years: 1M × 365 × 5 = 1.8 billion rows, totalling roughly 900 GB.

900 GB fits comfortably on a single modern database server. This is an important realisation: you don't shard for storage capacity here, you shard for availability and read throughput. Sharding a database that doesn't need it for capacity adds operational complexity for no gain; knowing this lets you justify starting with a simpler single-primary and read replica setup.

The clicks table is a different story. At 100:1, you're storing 100M click events per day, about 36 billion rows per year at just the base URL creation rate. At even 100 bytes per click row, that's 3.6 TB/year. This is why clicks needs a separate, write-optimised store from day one of the scaled design.

Numbers to remember URLs: ~900 GB over 5 years, fits on one node · Clicks: ~3.6 TB/year, needs a separate store

Bandwidth: usually not the bottleneck Worth a quick check ›

For the write path: 11.5 writes/sec × 500 bytes ≈ 6 KB/sec inbound. Negligible. For the read path: a redirect response is tiny, just an HTTP 302 with a Location header, roughly 200 bytes. At 5,750 peak reads/sec that's about 1.1 MB/sec outbound. Again, not a concern.

Bandwidth becomes interesting only if you serve the destination page through the shortener (you don't), or if you add analytics pixel tracking with embedded images. For a pure redirect service, bandwidth is never the constraint. Saying so in an interview shows you've thought about it and ruled it out rather than just ignoring it.

Cache sizing: how much memory do you actually need? Smaller than you'd expect ›

URL access follows a power-law distribution. A small fraction of URLs receive the vast majority of traffic. Caching the hot 20% of the working set typically yields an 80%+ cache hit rate. The question is: what does 20% actually cost in RAM?

The daily active working set is roughly 1M new URLs × 5 years × 365 days = 1.8B total URLs, but on any given day only a slice of those are active. A reasonable model: cache the top 20% of daily active URLs. If 10M URLs are active on a given day, caching 2M × 500 bytes = 1 GB of Redis memory covers the hot tail. Even at 10× that scale, you're talking 10 GB. A single Redis node comfortably holds this.

Why this matters for the design: knowing the cache fits entirely in memory on one or two nodes means you can justify a simple Redis setup rather than immediately reaching for Redis Cluster. Scale the architecture to what the numbers actually demand.

Numbers to remember Caching 20% of hot URLs ≈ 1–10 GB RAM · Single Redis node is sufficient at moderate scale

Play with the numbers

Use the estimator below to see how these figures shift at different scales: a single viral campaign versus a global platform.

Adjust to your scenario

New URLs / day 1 M

Read : Write ratio 100 : 1

Avg URL size (bytes) 500 B

Retention period (years) 5 yr

Write QPS

—

requests / sec

Read QPS

—

requests / sec

Peak Read QPS

—

~5× average

Storage needed

—

over retention period

Cache size (20%)

—

hot URL working set

Total URLs

—

over retention period

💡

The key insight these numbers reveal: this system is overwhelmingly read-heavy, storage is manageable, bandwidth is a non-issue, and the cache footprint is small. That combination, high read ratio, cheap cache, modest storage, is exactly why a simple cache-aside pattern with Redis works so well here. The architecture follows naturally from the numbers.

04

High-level architecture

Start with this overview diagram before diving into any component. Establish the full picture, then drill down. This is the structure you should draw on the whiteboard in the first 10 minutes.

Figure 1: High-level architecture. The dashed lines represent asynchronous flows that do not block the critical redirect path.

Component breakdown

Load balancer. L7 (HTTP-aware), distributes traffic across stateless API servers. Can route /create (write path) separately from /:code (read path) for independent scaling.

API servers. Fully stateless; no session affinity required. This is the core architectural choice that enables horizontal autoscaling. Every request contains everything needed to process it.

Redis cache. The most important component for read performance. With a 20% working-set cache and a power-law click distribution, ~80% of redirects are served without touching the database.

Database. Stores URL mappings durably. MySQL works at moderate scale; Cassandra or DynamoDB for large-scale write throughput and horizontal sharding.

Analytics service. Fully asynchronous. Clicks are published to Kafka, never blocking the redirect response. A stream processor (Spark/Flink) aggregates to a columnar store.

Architectural rationale

Every component is a deliberate answer to a specific constraint. Expand each section to understand the reasoning behind the choice, which is what interviewers at L5+ are listening for.

① Load balancer Availability & independent scaling ›

Why L7 over L4? An L7 (HTTP-aware) load balancer lets you route by URL path. The write path (POST /shorten) and read path (GET /:code) have completely different traffic patterns. Reads are 100× more frequent. L7 lets you scale them independently, routing them to separate server pools, without any client changes.

Why not DNS round-robin? DNS has no health-check awareness. A failed server receives traffic until the DNS TTL expires, which is potentially minutes. A proper load balancer detects unhealthy nodes within seconds and removes them. That responsiveness is essential for a 99.99% availability target.

⚖ Tradeoff The load balancer itself is a potential single point of failure. Mitigate with an active-passive LB pair, or use a managed cloud LB (AWS ALB, GCP GLB) that provides built-in redundancy.

Alternatives Nginx / HAProxy (self-hosted) AWS ALB / GCP GLB (managed) L4 (if path-based routing isn't needed)

② API servers Statelessness is the key decision ›

Why stateless? A stateless server holds no session data. Every request carries everything needed to process it. This is the single architectural decision that makes horizontal autoscaling trivial: spin up more servers, point the load balancer at them, done. No sticky sessions, no session replication, no coordination overhead.

What state is explicitly excluded? No in-flight request state, no user sessions, no per-connection counters. The only "state" is the pre-fetched ID range in memory (counter-based encoding), which is safe to lose on restart, since the server simply fetches a new range from Zookeeper.

⚖ Tradeoff Statelessness pushes all durable state into the cache and DB layer. Those become the scaling bottlenecks, which is intentional, since they have well-understood strategies (sharding, read replicas).

③ Redis cache The highest-impact component ›

Why is this the most important component? URL access follows a heavy power-law distribution, where a small fraction of URLs (viral links, popular campaigns) receives the vast majority of clicks. Caching 20% of the working set by count typically yields 80%+ hit rates. Without cache, every redirect hits the DB. With it, most complete in ~1 ms with no DB I/O whatsoever.

Why Redis over Memcached? Redis supports native TTL per key (critical for URL expiration), persistence options (RDB/AOF snapshots for warm-up after restart), and Cluster mode for horizontal sharding. Memcached is marginally faster for pure string caching but lacks all three of these operationally important features.

⚖ Tradeoff Cache introduces a stale-data window. A deleted URL may still be served until its TTL expires. Mitigate with proactive cache invalidation on every write, issue DEL short_code to Redis alongside the DB delete.

Alternatives Memcached (faster, but no per-key TTL or clustering) Local L1 cache (used in addition to Redis for hot-key burst protection)

④ Database Durability anchor of the system ›

Why not use Redis as the primary store? Redis is an in-memory store. Data loss on failure is possible without careful persistence configuration, and even then it lags behind a relational DB in durability guarantees. The database is the source of truth; the cache is a performance optimisation layered on top of it.

MySQL vs Cassandra: when to switch? MySQL handles hundreds of millions of URL records comfortably on a single node with read replicas. Reach for Cassandra (or DynamoDB) when you need multi-region active-active writes, or when manual sharding in MySQL becomes operationally painful. For most interview scenarios, starting with MySQL + read replicas is correct, jumping straight to Cassandra invites difficult questions about tunable consistency.

⚖ Tradeoff A relational DB gives ACID guarantees and easy deduplication queries, but requires explicit sharding at large scale. Cassandra shards automatically but trades consistency for availability, which is fine for URL lookups, more complex to reason about for writes.

⑤ Analytics service Why async is non-negotiable ›

Why not log clicks synchronously in the redirect handler? A redirect must complete in <10 ms. Even a fast synchronous DB write adds 5–15 ms on every redirect. At 1,000 redirects/sec that's significant load; at 50,000/sec after a viral link, it becomes a bottleneck that degrades the entire service for all users. Decoupling with Kafka means the redirect path never waits on analytics.

Why Kafka over a background thread writing to the DB? Kafka is a durable buffer. If the consumer falls behind during a traffic spike, events queue safely with no data loss and no backpressure on API servers. A background thread writing directly to the DB has no such buffer; if the DB slows down, the thread queue grows in the API server's memory until OOM.

⚖ Tradeoff Async analytics means click counts are eventually consistent, dashboard numbers may lag by seconds to minutes. This is explicitly acceptable per the NFRs in Step 2. If real-time exact counts were a hard requirement, this decision would need to be revisited.

Alternatives Redis INCR (fast counts, but lossy and hard to enrich with geo/referrer) Direct DB write (simpler, but couples analytics to the redirect path) AWS Kinesis / GCP Pub/Sub (managed Kafka alternatives)

💬

What the interviewer is really asking: "Walk me through why you made each of these choices." Candidates who can articulate the reasoning behind a component, not just what it does, are the ones who pass at L5+.

How this compares to Bitly and TinyURL

Both are production systems at significant scale, and their known architectures validate several of the choices above.

Decision	Our design	Bitly (known)	TinyURL (known)
Short code generation	Counter + Base62, Zookeeper ranges	Custom hash + Base58 (avoids visually ambiguous chars like 0/O)	Sequential counter, Base36 for older links
Primary datastore	MySQL + read replicas → Cassandra at scale	Cassandra for URL mappings (known from engineering blog)	MySQL (still relational at their scale)
Caching	L1 local + L2 Redis	Redis + CDN edge caching for top URLs	In-process cache + Redis
Redirect type	302 default, 301 opt-in	301 for branded links (SEO benefit for customers); 302 for tracked links	301 by default (analytics less central to their model)
Analytics pipeline	Kafka → Flink → ClickHouse	Kafka-based event pipeline, proprietary analytics layer	Basic click counts; no real-time analytics product

💡

Bitly's choice of 301 for branded links is worth noting. It's a deliberate product decision that trades click analytics for SEO link equity passing to the destination. This is the same tradeoff our design exposes as an opt-in. There's no universally correct answer; the right choice follows from requirements.

05

Encoding strategies

The core algorithmic question: given a long URL, how do you generate a short, unique, collision-free code? There are two dominant approaches.

Figure 2: Two encoding strategies. Approach B (counter + Base62) is recommended for production systems.

Recommended: Approach B with Zookeeper range allocation

Each API server pre-fetches a range of IDs (e.g., 1,000–2,000) from Zookeeper at startup. It burns through them locally without any network call. When the range is exhausted, it fetches the next range. This removes Zookeeper from the hot path entirely. The critical redirect path never touches it.

// Base62 encoding
const ALPHABET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';

function toBase62(id) {
  let result = '';
  while (id > 0) {
    result = ALPHABET[id % 62] + result;
    id = Math.floor(id / 62);
  }
  return result.padStart(7, '0');
}

// ID=12345 → "0003d7", ID=3,521,614,606,208 → "zzzzzzz"

📐

62⁷ = 3,521,614,606,208 unique codes, over 3.5 trillion. At 1 billion new URLs per day, this lasts nearly 10 years before you need an 8-character code.

One open question: what if the same URL is submitted twice?

Approach A (hash) handles this naturally. The same long URL always produces the same hash, so the same short code is returned. No extra work needed.

Approach B (counter) does not. Two submissions of https://example.com/same-url will get two different IDs and two different short codes. Whether this matters depends on your product requirements. If it doesn't (many shorteners intentionally allow multiple short codes per long URL, e.g. for tracking different campaigns), ignore it. If deduplication is needed, maintain a separate long_url → short_code lookup table with a unique index on long_url. This adds one DB read on every creation request to check the dedup table first, return the existing code if found, otherwise create a new one. At 11.5 writes/sec the overhead is trivial; at millions of writes/sec you'd shard the dedup table on a hash of the long URL.

05b

API design

The two endpoints are simple, but the design choices within each one are worth articulating.

The two core endpoints

Endpoint	Method	Purpose
`/shorten`	POST	Create a new short URL
`/:code`	GET	Redirect to the original URL

POST /shorten, create a short URL

Request:

// POST /shorten
{
  "long_url": "https://example.com/very/long/path?with=params",
  "custom_alias": "my-link",    // optional
  "expires_at": "2025-12-31T00:00:00Z"  // optional
}

Response:

// 201 Created
{
  "short_url": "https://short.ly/abc123",
  "short_code": "abc123",
  "long_url": "https://example.com/very/long/path?with=params",
  "expires_at": "2025-12-31T00:00:00Z",
  "created_at": "2025-03-28T10:00:00Z"
}

Why POST and not GET for URL creation? Idempotency and caching ›

GET requests are by definition idempotent and cacheable. Browsers, proxies, and CDNs feel free to cache them. A creation operation is neither. Submitting the same GET request twice could create two short codes (with Approach B). Using POST signals to the entire HTTP stack that this request has side effects and must not be cached or replayed automatically.

The response is 201 Created rather than 200 OK, a small detail that signals a resource was created and allows the Location header to point to the new resource if needed.

Input validation: what to check Security and correctness ›

Four things to validate on every creation request, in order of importance:

URL validity. Reject malformed URLs that would never resolve. Check for valid scheme (http:// or https://), reject bare IP addresses if you want to block direct IP linking, enforce a maximum length (URLs over ~2,000 chars cause issues in some browsers).

Custom alias availability and safety. Check against reserved words (api, health, admin, blog), profanity blocklist, and existing aliases. Return 409 Conflict if taken.

Rate limiting. Limit creation requests per API key and per IP. URL shorteners are frequently abused for spam and phishing; without rate limiting a single actor can generate millions of short codes.

Expiry sanity. If expires_at is provided, reject values in the past and values beyond a maximum retention window (e.g. 10 years).

⚖ Tradeoff Malware and phishing URL checks (e.g. Google Safe Browsing API) should be async. Never block the creation response on an external API call. Flag the URL for review and serve a warning interstitial on redirect until the check completes.

GET /:code, redirect to the original URL

// GET /abc123
// → 302 Found
// Location: https://example.com/very/long/path?with=params
// Cache-Control: no-store  (prevents browser caching for analytics)

// → 404 Not Found  (code doesn't exist or has expired)
// → 410 Gone       (code existed but was deliberately deleted)

💡

410 Gone vs 404 Not Found. A small detail that signals senior thinking. A 404 means "never heard of it." A 410 means "this existed and was intentionally removed." For deleted short URLs, 410 is the semantically correct response and tells crawlers to permanently remove the URL from their index. Most implementations just use 404 everywhere. Mentioning 410 is a differentiator.

Optional endpoints worth mentioning

Endpoint	Method	Purpose	Level
`/urls/:code`	GET	Retrieve metadata for a short URL (owner only)	L4
`/urls/:code`	DELETE	Delete a short URL	L4
`/urls/:code/analytics`	GET	Click stats: counts, geo breakdown, referrers	L5
`/urls`	GET	List all short URLs for the authenticated user	L5
`/bulk/shorten`	POST	Create multiple short URLs in one request	L6

06

Redirect flow & 301 vs 302

The redirect is the highest-volume operation in the system. Every millisecond saved here compounds across billions of requests. The choice between 301 and 302, first flagged in the analytics NFR in Section 2, gets resolved here.

Figure 3: Redirect request flow. The cache hit path (left branch) is the common case and returns without touching the database.

The 301 vs 302 decision: closing the loop from Section 2

In Section 2 we established analytics as a requirement and flagged it as eventually consistent. We just need the clicks to be recorded, not instantly. That single NFR resolves this choice.

Status code	Browser behavior	Server load	Analytics
`301 Permanent`	Caches redirect locally. Future clicks bypass your server	Minimal after first visit	❌ Clicks not logged. Browser redirects silently
`302 Temporary`	Re-requests on every click. No local caching	Every click hits your server	✓ Full analytics. Every click is a server request

✓

The decision: analytics is a stated requirement, so 302 is the only valid default. Using 301 would silently discard click data after the first visit per browser. Offer 301 as an explicit opt-in per URL for users who want maximum redirect speed and have no need for click analytics. This is also the correct answer to "why not just use 301 for performance?" You'd be trading a stated requirement for a marginal latency saving on repeat visitors.

07

Data model

Don't reach for a schema immediately. Start by identifying the entities, then reason about the access patterns. The schema follows naturally from those two things.

Step 1, What entities do we have?

Think about the nouns in the system. A URL mapping is the core entity: a short code pointing to a long URL. A user may own that mapping (for management, analytics, custom aliases). A click event captures each redirect for analytics. That's it. Three entities, no more.

Step 2, What are the access patterns?

Each entity gets used very differently, and those differences end up shaping the schema more than anything else:

Operation	Frequency	Query shape
Redirect: look up long URL by short code	Extremely high (every click)	Point lookup by `short_code`
Create: insert a new short URL	Low (100× less than reads)	Single-row insert
Analytics: count/group clicks per URL	Medium (dashboard queries)	Range scan by `url_id + time`
Owner lookup: list URLs by user	Low (management UI only)	Range scan by `user_id`
Expiry sweep: find expired URLs	Very low (background job)	Range scan by `expires_at`

Step 3, What does this tell us about the schema?

Two things jump out immediately. First, the redirect lookup dominates every other operation by two orders of magnitude. The short_code must be indexed, and that index must stay small and fast. Second, click data has completely different access patterns from URL data: it is append-only, queried by time range, and will grow without bound. These two things should not live in the same table, and at scale they should not live in the same database.

💡

A common early mistake is putting click counts directly on the urls row as a counter column. This creates a write hotspot. Every redirect hammers the same row with an UPDATE. Separate the concerns: keep urls append-mostly (created once, rarely updated) and handle click aggregation in the analytics pipeline.

The schema that follows from this reasoning

Figure 4: Data model. Each table has a single dominant access pattern that drives its structure and index choices.

Why each field and index?

The schema above is a conclusion, not a given. Expand each section to understand why each column and index was chosen, and what would break without it.

urls.short_code (VARCHAR(8) UNIQUE) The most critical column ›

This column carries the most important index in the system. Every redirect, the dominant operation, is a point lookup by short_code. A unique B-tree index on it gives O(log n) lookups in the worst case, but with a hot working set in the buffer pool, it is effectively O(1) in practice.

Why VARCHAR(8) not VARCHAR(255)? Index pages are fixed-size. Narrower columns pack more entries per page, which means more of the index fits in memory. For a table that could hold hundreds of millions of rows, this difference matters. 8 characters comfortably covers a 7-character Base62 code with one spare byte.

Why store both id and short_code? The id is the internal primary key used for joins (e.g., clicks.url_id). The short_code is the external lookup key. Keeping them separate means your internal row identity is decoupled from the URL encoding scheme. If you ever change encoding strategies, you don't need to rewrite foreign keys across the database.

⚖ Tradeoff A UNIQUE constraint adds a write overhead, every insert must check for collisions. At 11 writes/sec this is trivial. At extreme write scale (millions/sec) you'd shard the index, which distributes the uniqueness check but makes it non-atomic. For this system, UNIQUE on a single-shard scale is the right default.

urls.expires_at (TIMESTAMP NULL) Nullable by design ›

Most URLs never expire, so the column is nullable. A NULL value means "no expiry". This avoids storing a sentinel value like 9999-12-31 which looks awkward and can confuse application code.

Do not index this column on the main table. Expiry sweeps are a background job running infrequently, they don't need sub-millisecond performance. Adding an index on expires_at would slow down every insert for negligible gain. If sweep performance becomes a concern, run the job against a replica, or maintain a separate lightweight table of (short_code, expires_at) that is cheap to scan.

How does this interact with the cache? When a URL is created with an expiry, the cache TTL should be set to min(24h, expires_at − now). This ensures the cache entry naturally falls away when the URL expires, without needing a proactive invalidation sweep.

urls.click_count (BIGINT) A deliberate denormalisation ›

Strictly speaking, this is redundant. You can derive the total click count with SELECT COUNT(*) FROM clicks WHERE url_id = ?. But at scale that query becomes expensive if the clicks table has billions of rows. A denormalised counter on the urls row enables instant display of a click count in the UI.

Why not update it synchronously on every redirect? This is the counter hotspot problem. If a URL receives 10,000 clicks per second, you'd have 10,000 concurrent UPDATE urls SET click_count = click_count + 1 statements fighting for a row lock. The right approach is to update it periodically via the analytics pipeline, batch-aggregate click events in Kafka, then apply a single UPDATE once per minute per URL. Eventual consistency is acceptable here; users don't need sub-second click count precision.

⚖ Tradeoff This counter will be slightly stale, typically lagging the analytics pipeline by 30–60 seconds. Document this in your API response (e.g., a count_updated_at timestamp). Callers who need exact real-time counts should query the analytics store directly.

The clicks table: why it's separate Different access patterns, different store ›

clicks is append-only and grows without bound. A URL shortener with 1M new URLs/day and a 100:1 read:write ratio generates 100M click events per day (about 1,000 rows per second, continuously). Within a year that's 36 billion rows. That volume is incompatible with a traditional RDBMS table sitting alongside the operationally critical urls table.

The solution: separate the store at the infrastructure level. At L4, putting clicks in a separate table in the same DB is acceptable. At L5/L6, it belongs in a write-optimised wide-column store (Cassandra, Bigtable) or streamed directly to a columnar analytics store (ClickHouse). At L7/L8, raw click events live in S3 Parquet partitioned by date, queryable via Athena or BigQuery, with ClickHouse serving as the hot-query layer for recent data.

Why store ip_hash not ip_address? Raw IP addresses are PII under GDPR and CCPA. Storing a one-way hash of the IP (salted and rotated) lets you perform deduplication (unique visitors) without storing the personal data itself. Mention this proactively. It signals awareness of compliance requirements.

⚖ Tradeoff Separating clicks into a different store means you lose foreign-key guarantees, making orphaned click records for deleted URLs possible. Handle this at the application layer: when a URL is deleted, publish a tombstone event to the analytics pipeline so downstream consumers can filter or purge the associated clicks.

Index on clicks(url_id, clicked_at) Composite ordering matters ›

The dominant analytics query is: "give me all clicks for URL X between time A and time B." A composite index (url_id, clicked_at) handles this perfectly. The leftmost column (url_id) filters down to a specific URL's rows, and the rightmost column (clicked_at) supports efficient range scans within that set.

Why not (clicked_at, url_id)? Reversing the order would make the index useful for "give me all clicks across all URLs in a time range", a much less common query. Always index in the order that matches your most frequent access pattern: filter first, range second.

At large scale, this index itself becomes enormous. When the clicks table moves to Cassandra, the partition key becomes url_id and the clustering column becomes clicked_at DESC, achieving the same access pattern but distributed across nodes automatically.

Schema decisions by level

L3/L4 Identify the three entities, list the access patterns, arrive at the schema. Explain the short_code unique index and why it is the most important one.

L5/L6 Separate clicks into a write-optimised store. Discuss sharding urls on short_code via consistent hash. Mention the ip_hash vs raw IP tradeoff.

L7/L8 Discuss the clicks archival pipeline: hot tier (ClickHouse, last 30 days) vs cold tier (S3 Parquet, partitioned by month). Raise the dual-write risk during migration and how to handle it with a feature flag and backfill job. Discuss tombstone events for deleted URLs.

08

Caching strategy

Sections 3 and 4 established why the cache exists. The 100:1 read ratio and the <10 ms latency target make it non-optional. This section covers the how: how many layers, what belongs in each, how eviction works, and what to do when data changes.

What gets cached

The short_code → long_url mapping. Small (~500 bytes), written once, rarely updated, and queried on every single redirect. It is close to a perfect cache candidate.

Cache hierarchy

Layer	Location	Hit latency	Size	Eviction
L1 (local)	Each API server process memory	~0.1 ms	~1,000 entries	TTL = 1 second
L2 (distributed)	Redis Cluster	~1 ms	20% of working set	LRU + TTL = 24h
L3 (database)	MySQL / Cassandra	~10 ms	Full dataset	—

Expand the sections below for the reasoning behind each layer.

Why L1 (local) in addition to L2 (Redis)? Not redundancy: different failure modes ›

Redis handles most traffic well, but a single viral URL can saturate the network connection to the specific Redis shard holding that key, even if Redis itself is healthy. The L1 local cache on each API server handles this case: requests served from L1 never touch the network at all. With a 1-second TTL, each API server effectively rate-limits its own outbound Redis traffic to at most one request per key per second, regardless of incoming load.

The two layers also fail independently. If Redis goes down, L1 continues serving hot entries for up to 1 second, during which a circuit breaker can activate and route misses directly to the DB rather than hammering a failing Redis. Without L1, a Redis failure immediately becomes a DB flood.

⚖ Tradeoff The 1-second L1 TTL means a deleted URL can still be served by in-flight L1 caches for up to 1 second. For most cases this is fine. If sub-second deletion propagation is required (e.g. legal takedown), you'd need a pub/sub invalidation broadcast to all API servers. This is a meaningful complexity increase that's only justified for a small subset of URL shorteners.

LRU eviction vs TTL: why both? Different problems, different tools ›

LRU eviction handles memory pressure: when the cache fills up, the least recently used entries are dropped to make room. TTL handles correctness: entries that haven't been evicted are still expired after 24 hours, preventing perpetually stale data from accumulating for obscure URLs that were cached long ago and never requested since.

They solve different problems and you need both. LRU alone means a long-tail URL cached six months ago stays forever, even if the underlying long URL was changed. TTL alone means a popular URL that's accessed every second still gets evicted after 24 hours and causes an unnecessary DB read.

For URLs with an expiry date set by the user, override the TTL: set it to min(24h, expires_at − now). This ensures the cache entry expires no later than the URL itself, without needing a separate invalidation sweep.

Cache invalidation on write Rare but must be immediate ›

URL updates and deletions are uncommon; users rarely change a short URL once created. But when they do happen, the cache must be invalidated immediately or you risk serving a stale (or deleted) URL for up to the TTL duration. The correct approach is synchronous invalidation: on any write to the urls table, issue a DEL short_code to Redis as part of the same operation (not as a background job). If the DEL fails, the write should still succeed (the TTL will eventually clean it up), but log the failure and alert.

This is a case where cache invalidation is simpler than it looks. Because short_code is the exact cache key, there's no partial invalidation, no cache stampede, and no need for versioning. Delete the key; the next request repopulates it from the DB.

09

Deep-dive scalability L5+

The high-level architecture in Section 4 works well at moderate scale. This section covers the decisions that come under pressure at 10× and 100×. These are the ones you need to reason about at Senior level and above. The diagram shows the full expanded system; the sections below explain the reasoning behind each scaling choice.

Figure 5: Production-grade architecture with CDN edge, Redis cluster, database sharding, and a full async analytics pipeline.

Database sharding: why and on what key? Shard key choice is the whole game ›

As established in capacity estimation, the urls table at moderate scale (~900 GB) fits on a single node and doesn't need sharding for capacity. You shard when read throughput exceeds what a single primary + replica set can serve, or when you need geographic distribution for latency.

The shard key is short_code, using consistent hashing. The reasoning is direct: every redirect is a point lookup by short_code, so sharding on it ensures each request goes to exactly one shard, with no cross-shard queries and no scatter-gather aggregation. The alternative (sharding on user_id) would be wrong: it optimises for the management UI (low traffic) at the expense of the redirect path (the dominant operation).

⚖ Tradeoff Consistent hashing distributes load evenly in theory, but hot shards emerge when a small number of short codes receive disproportionate traffic. Mitigate with virtual nodes (many hash ring slots per physical shard) to spread load, and with the L1/L2 cache absorbing the truly viral URLs before they reach the DB at all.

Distributed ID generation: Zookeeper vs Snowflake Keep it off the hot path ›

A global auto-incrementing counter is the simplest ID generator but a single point of contention at scale. Two patterns eliminate that contention:

Zookeeper range allocation: each API server pre-fetches a range of IDs (e.g. 1,000–2,000) from Zookeeper at startup. It burns through them locally with no network calls. When exhausted, it fetches the next range. Zookeeper sees only one request per thousand URLs created, effectively removing it from the hot path. The risk: if a server crashes mid-range, those IDs are lost. Given the key space is 3.5 trillion, a few thousand lost IDs is irrelevant.

Snowflake IDs: a 64-bit value composed of a timestamp (41 bits), datacenter ID (5 bits), machine ID (5 bits), and per-machine sequence (12 bits). Entirely self-contained, requiring no coordination or single point of contention. The tradeoff is clock skew: if a machine's clock jumps backward, ID monotonicity breaks. Handle with a clock guard that refuses to generate IDs if the system clock regresses.

Which to choose Zookeeper ranges are simpler to reason about and sufficient for most cases. Snowflake is better for truly distributed multi-datacenter setups where you want no coordination dependency at all.

Analytics pipeline: why this sequence of components? Each stage has a specific job ›

The pipeline is Kafka → Flink/Spark → ClickHouse → S3, and each stage exists for a specific reason. Kafka is the durable buffer: it decouples the API servers (producers) from the analytics processors (consumers), absorbs traffic spikes without backpressure, and retains events for 7 days allowing replay if a consumer has a bug. Flink or Spark Streaming aggregates raw click events into per-URL, per-hour, per-country counts, which is the shape your analytics dashboard actually queries. ClickHouse stores the aggregated metrics and answers dashboard queries in milliseconds due to its columnar storage format. S3 Parquet holds the raw events, partitioned by date, as the long-term archive for ad hoc queries and backfill.

Why not write directly from Kafka to ClickHouse? You could, but the aggregation step in Flink is important: it reduces 100M raw click rows/day to a far smaller set of pre-computed aggregates, keeping ClickHouse fast and cheap. Without it, every dashboard query would scan billions of rows.

Geo-distribution: what breaks at global scale? Replication lag is the core tradeoff ›

GeoDNS routes users to the nearest regional cluster, reducing redirect latency from ~100 ms (cross-continent) to ~5–10 ms (local region). Each region gets its own Redis cache and a read replica of the DB. URL creation writes always go to the global primary. You never do active-active writes to avoid split-brain.

The failure mode to reason through: a URL is created in region A, and a user in region B clicks it 200 ms later, before the async replication has propagated the new row to region B's replica. The correct behaviour is not a 404 but a fallback: on a cache miss that also misses the local replica, the read path falls through to the global primary. This adds latency for that one request but guarantees correctness. P99 replication lag between regions is typically under 100 ms, so this fallback is rare in practice.

⚖ Tradeoff Fallback to the global primary works, but if the primary is in a distant region (e.g. US East while the user is in Singapore) that fallback read is slow. Mitigate by pre-warming the cache: when a URL is created, proactively push the mapping to all regional caches via a pub/sub broadcast, rather than waiting for the first cache miss per region to trigger the DB read.

10

Failure modes & edge cases

Proactively raising these signals L5+ thinking. Don't wait for the interviewer to probe; mention them as you walk through the design.

Scenario	Problem	Solution	Level
URL expiry	Expired URLs still served until TTL expires	Background sweep job + proactive cache invalidation on expiry. Set cache TTL = `min(24h, expires_at − now)`.	L4
Same long URL submitted twice	Creates duplicate short codes	On hash approach: same code naturally. On counter approach: optional dedup table with `long_url → short_code` mapping.	L4
Viral / hot-key URL	Single Redis key overloaded	L1 local in-process cache (1s TTL) on each API server. Reduces Redis traffic by 100–1000×.	L5
Custom alias conflicts	User requests an alias that's taken or reserved	Check against a blocklist (reserved words, profanity). Rate limit per user. Return 409 on conflict.	L5
Malicious URLs	Short URL masking phishing / malware	Async Google Safe Browsing API check on creation. Flag URL, show warning interstitial page. Never block the write path synchronously.	L5/L6
Redis cache failure	All traffic falls through to DB	Circuit breaker on the cache layer. DB can handle the fallback load with read replicas. Alert and restore quickly.	L6
Analytics data loss	Kafka consumer lag / consumer crash	Kafka retention of 7 days. Consumer group offset management. Idempotent click writes (dedup on `click_id`).	L7
Multi-region consistency	Redirect arrives before replication completes	Fall back to global primary on regional cache miss. Accept eventual consistency on reads; never accept it on writes.	L7/L8

11

How to answer by level

The same question, "design a URL shortener", has a different correct answer depending on the level being evaluated. What changes isn't the system; it's the frame. Lower levels are evaluated on correctness and coverage. Higher levels are evaluated on judgment, tradeoff reasoning, and how independently you drive the conversation.

L3 / L4 SDE I / SDE II Can you build a working system? ›

What good looks like

Arrives at the high-level architecture without being led
Chooses Base62 encoding and explains the math
Identifies short_code as the critical index
Puts Redis in front of the DB without being asked
Resolves 301 vs 302 with a clear justification
Produces plausible capacity numbers

Common gaps

No cache layer, DB handles all reads
Analytics written synchronously in the redirect path
Hash collisions not addressed
No index discussion at all
301 vs 302 not mentioned
Skips capacity estimation entirely

L5 Senior SDE Do you understand the tradeoffs you made? ›

What good looks like

Identifies tradeoffs in their own design unprompted
Explains why MySQL before Cassandra, not just which
Raises hot-key problem and the L1 cache solution
Designs the Kafka analytics pipeline end-to-end
Discusses sharding strategy and shard key choice
Connects NFRs back to specific architectural choices

What separates L4 from L5

L4 names components; L5 justifies them
L4 waits to be asked about edge cases; L5 raises them
L4 picks Cassandra because it "scales"; L5 explains when not to
L4 describes the cache; L5 reasons about hit rates and eviction
L4 knows 302 is correct; L5 connects it back to the analytics NFR

L6 Staff SDE Can you own this system end-to-end? ›

What good looks like

Surfaces hidden requirements before the interviewer does: GDPR, multi-tenancy, abuse
Discusses operational concerns: deploy strategy, SLOs, alerting, on-call burden
Proposes a migration path: how you'd move from v1 to v2 safely
Identifies organisational dependencies (who owns the analytics store?)
Reasons about the system under failure, not just the happy path
Drives the conversation and asks better questions than the interviewer

What separates L5 from L6

L5 designs the system; L6 designs the system and its lifecycle
L5 raises edge cases; L6 raises cross-cutting concerns across teams
L5 answers the interviewer's questions well; L6 changes the questions
L5 discusses operational topics when asked; L6 introduces them naturally

L7 / L8 Principal / Distinguished (Google L8) Should we be building this, and how? ›

What good looks like

Challenges the requirements themselves: why 99.99%, what's the cost of each additional nine?
Reframes the problem space: custom domains, white-label, multi-region regulatory partitioning
Introduces long-horizon thinking: 5-year capacity, platform extensibility, API versioning
Connects technical decisions to business outcomes explicitly
Identifies the highest-risk assumptions and proposes how to validate them early
Discusses build vs buy vs adopt (managed Kafka vs Kinesis vs in-house)

What separates L6 from L7/L8

L6 owns a system; L7/L8 shapes what systems get built
L6 proposes good solutions; L7/L8 questions whether the solution space is right
L6 thinks about SLOs; L7/L8 thinks about the business model behind the SLO
L6 designs for current scale; L7/L8 designs for optionality at future scale
The interviewer at this level is a peer. The conversation is collaborative, not evaluative

Classic probes: how the expected answer shifts by level

The same question has a different correct answer depending on who's being evaluated. The depth of reasoning is the signal, not the conclusion.

Probe	L3/L4	L5/L6	L7/L8
"What if the same URL is submitted twice?"	Describes the dedup table: same long URL should return the same short code	Explains why hash approach handles it naturally and counter doesn't; discusses storage cost of the dedup index on `long_url`	Raises whether dedup is the right default at all. Multiple codes per URL is intentional for campaign tracking. Frames it as a product decision, not just a technical one
"What's the biggest bottleneck?"	The database, as it has to serve every redirect	The DB at low scale (solved by caching), then ID generation at high scale (solved by Zookeeper ranges or Snowflake)	Depends on traffic shape, explains the analysis before answering. Then reframes: at true scale the bottleneck is often organisational (who owns the on-call, which team controls the analytics store)
"How do you expire URLs without a full table scan?"	Background job that sweeps for expired rows; set cache TTL to `min(24h, expires_at − now)`	Run the sweep against a read replica to avoid impacting the primary; maintain a lightweight `(short_code, expires_at)` index table to make the sweep cheap	Questions whether expiry is enforced at the DB layer at all. At scale, soft-delete with async cleanup via a Kafka tombstone event is more operationally tractable than synchronous row deletion
"Walk me through a cache miss"	Cache miss → DB lookup → cache the result → return redirect	Same, but notes the stampede risk: if 10,000 requests miss simultaneously for the same cold key, all hit the DB at once. Mitigate with a per-key mutex or probabilistic early expiration	Distinguishes cold miss (key never cached) vs expired miss (key evicted) vs invalidated miss (key explicitly deleted). Each has different frequency and mitigation strategy

Design a URL Shortener: System Design Interview Guide

What the interviewer is testing

Requirements clarification

Functional requirements

Non-functional requirements

Capacity estimation

Play with the numbers

High-level architecture

Component breakdown

Architectural rationale

How this compares to Bitly and TinyURL

Encoding strategies

Recommended: Approach B with Zookeeper range allocation

One open question: what if the same URL is submitted twice?

API design

The two core endpoints

POST /shorten, create a short URL

GET /:code, redirect to the original URL

Optional endpoints worth mentioning

Redirect flow & 301 vs 302

The 301 vs 302 decision: closing the loop from Section 2

Data model

Step 1, What entities do we have?

Step 2, What are the access patterns?

Step 3, What does this tell us about the schema?

The schema that follows from this reasoning

Why each field and index?

Schema decisions by level

Caching strategy

What gets cached

Cache hierarchy

Deep-dive scalability L5+

Failure modes & edge cases

How to answer by level

Classic probes: how the expected answer shifts by level

How the pieces connect

System Design Mock Interviews

Practice Coding Interviews Now