Design a URL Shortener: System Design Interview Guide
Simple to describe, hard to scale. A complete walkthrough from requirements to production-grade architecture, with the reasoning behind every decision.
What the interviewer is testing
The URL shortener is deceptively simple to describe and genuinely hard to scale. That gap is exactly why it's the canonical system design question. It forces you to reason about capacity, caching, encoding, sharding, and analytics pipelines all at once, in a system everyone already understands.
What changes across levels isn't the system, it's the frame. At L3/L4 the question is whether you can arrive at a working design independently. At L5/L6 it's whether you understand the tradeoffs in your own choices before being asked. At L7/L8 it's whether you can challenge the problem definition itself and drive the conversation rather than respond to it.
The probes that separate levels. The same four questions appear at every level, but the expected answer is completely different each time:
- "What if the same URL is submitted twice?"
- "How would you handle a URL that goes viral?"
- "How do you expire links without a full table scan?"
- "Walk me through a cache miss."
See Section 11 for exactly how the expected answer shifts from L3/L4 through L7/L8.
Requirements clarification
The first few minutes of a system design interview are the highest-leverage time you have. The requirements you establish here, and the ones you explicitly defer, shape every tradeoff you'll make for the rest of the conversation.
Functional requirements
These are the things the system does. Two are non-negotiable; the rest are worth raising as optional to signal breadth.
| Requirement | Notes |
|---|---|
| Shorten a long URL to a 7–8 character code | Core write path |
| Redirect a short URL to the original | Core read path, must be fast |
| Custom aliases | Optional; mention at L4+ |
| URL expiration | Optional; important for cleanup design |
| Analytics (click counts, geo, referrer) | Drives the 301 vs 302 decision later |
Non-functional requirements
| Property | Target |
|---|---|
| Availability | 99.99%, redirect failures are customer-visible |
| Redirect latency | <10 ms p99 on cache hit; <50 ms p99 on cache miss |
| Durability | No silent loss of URL mappings |
| Analytics consistency | Eventually consistent. An async pipeline is fine. |
| Scale | 1 M new URLs/day; 100:1 read/write ratio |
NFRs are where most of the interesting design tension lives. Each target above is a deliberate choice. Expand any of them to understand the reasoning behind it, and what it directly drives in the architecture.
Availability: 99.99% Asymmetric across read and write ›
99.99% means roughly 52 minutes of downtime per year. The justification is asymmetric: a redirect failure is immediately visible to an end user, meaning they clicked a link and got nothing. A creation failure is far less damaging; users can retry, or the failure is invisible entirely (a background process that tries again).
This distinction matters for the architecture. It means the read path needs more redundancy than the write path. Multiple read replicas, a warm cache, and a load balancer with fast health checks are all direct responses to this requirement. The write path, which at 11.5 QPS is barely a whisper, can tolerate a brief primary failover without meaningful user impact.
Redirect latency: <10 ms p99 on cache hit The number that justifies the entire cache layer ›
10 ms is aggressive. A round-trip to a database in the same data centre is typically 1–5 ms under ideal conditions, but p99 latency accounts for the worst 1% of requests: slow queries, lock contention, garbage collection pauses. Hitting the DB on every redirect makes it very hard to stay under 10 ms at the 99th percentile.
Redis, by contrast, responds in under 1 ms for a simple key lookup. That headroom (1 ms for the cache lookup, plus network overhead) is why the <10 ms target effectively mandates a cache. Without it, the target is aspirational rather than achievable.
The <50 ms target on a cache miss is more relaxed by design. A miss means a DB read, which is slower but still expected to be rare (80%+ of requests should hit the cache). Setting a separate, looser target for misses is honest engineering, you're acknowledging the two paths have different cost profiles rather than pretending they're the same.
Durability: no silent loss of URL mappings Asymmetric with analytics ›
A lost URL mapping is catastrophic in a way that a lost click event is not. If short.ly/abc123 disappears from the database, every future user who clicks that link gets a 404. There is no recovery. The original long URL is gone. This is why URL mappings must be stored in a durable, replicated database with write-ahead logging, not in Redis (which is a cache, not a database, despite being able to persist data).
The framing "no silent loss" is deliberate. It means a write must not return success unless the data is safely persisted. An acknowledged write that later disappears due to async replication lag on a primary failure is the failure mode to design against. This points toward synchronous replication or quorum writes for the URL mapping table.
Analytics consistency: eventually consistent is fine A deliberate relaxation ›
This is an explicit relaxation, and stating it out loud is just as important as stating the strict requirements. Not every part of the system needs the same guarantees. If a dashboard shows 9,842 clicks instead of 9,847 because five recent events are still in the Kafka pipeline, no user is harmed. The number will catch up within seconds.
This relaxation is what makes the async analytics architecture acceptable. If click counts needed to be exact and real-time, you'd need synchronous writes on the redirect path, which would blow the latency budget. By establishing eventual consistency as acceptable here, you unlock the entire async pipeline design: Kafka → Flink → ClickHouse, with the redirect path never waiting for any of it.
The key skill is knowing which parts of a system can tolerate relaxed consistency. URL mappings cannot. Click counts can. Stating that distinction explicitly and tracing it back to user impact is what separates a senior answer from a junior one.
Scale: 1M URLs/day, 100:1 read:write An anchor, not a ceiling ›
Treating scale as a fixed assumption rather than a range is a common mistake. 1M URLs/day and 100:1 is a reasonable starting anchor. It's the kind of number you'd get from a mid-size product. But the interviewer may nudge you toward 100M URLs/day, or toward a single viral event generating 50,000 QPS for 10 minutes. The architecture should be defensible across that range, not brittle at the edges.
The read:write ratio is the more structurally important number. At 10:1, you'd still care about performance but might question whether a cache layer is worth its complexity. At 100:1, the cache is obvious. At 1000:1 (a single viral URL), the cache becomes the only thing standing between you and a DB meltdown. Naming this ratio early anchors every caching discussion that follows.
A useful pattern: establish the baseline (1M/day, 100:1), then proactively mention how the design degrades and recovers at 10× scale. This shows you've thought beyond the happy path without needing the interviewer to push you there.
Out of scope, say it clearly: user authentication, billing, abuse detection, malware scanning. Mention you'd address these given more time. Not raising them at all is a signal that you haven't thought about the system's full surface area.
Capacity estimation
The point of capacity estimation isn't to get a precise number. It's to understand the character of the system before you design it. A few rough calculations will tell you whether you're dealing with a read-heavy or write-heavy workload, whether storage fits on one machine or needs sharding, and how big the cache needs to be to actually help. Those answers shape every decision that follows.
There are four dimensions worth thinking through: traffic, storage, bandwidth, and cache. Expand each one to see the reasoning behind it.
Traffic: QPS and the read/write split The most revealing number ›
Start with what you can anchor to: how many new URLs are created per day? 1 million is a reasonable starting assumption for a mid-scale system. Converting to per-second is a one-step calculation: 1M ÷ 86,400 ≈ 11.5 writes/sec. That's modest. A single DB node handles this without breaking a sweat.
The more interesting number is reads. Ask yourself: for every URL created, how many times is it clicked? For a general-purpose shortener, 100:1 is a reasonable ratio. A marketing campaign URL might be 10,000:1. That gives you 1,150 reads/sec on average, but traffic is never flat. Apply a 5× peak multiplier for spikes (a tweet goes viral, a campaign launches), which puts peak reads at ~5,750/sec.
Why this matters: at 100:1, this system is almost entirely reads. That single insight justifies the entire caching strategy. If reads were only 2:1, you'd think about the design very differently.
Storage: does this fit on one machine? Usually less scary than it sounds ›
Each URL row needs: the short code (~8 bytes), the long URL (average ~200 bytes for a real URL, but allow 500 bytes including metadata columns). At 1M rows/day for 5 years: 1M × 365 × 5 = 1.8 billion rows, totalling roughly 900 GB.
900 GB fits comfortably on a single modern database server. This is an important realisation: you don't shard for storage capacity here, you shard for availability and read throughput. Sharding a database that doesn't need it for capacity adds operational complexity for no gain; knowing this lets you justify starting with a simpler single-primary and read replica setup.
The clicks table is a different story. At 100:1, you're storing 100M click events per day, about 36 billion rows per year at just the base URL creation rate. At even 100 bytes per click row, that's 3.6 TB/year. This is why clicks needs a separate, write-optimised store from day one of the scaled design.
Bandwidth: usually not the bottleneck Worth a quick check ›
For the write path: 11.5 writes/sec × 500 bytes ≈ 6 KB/sec inbound. Negligible. For the read path: a redirect response is tiny, just an HTTP 302 with a Location header, roughly 200 bytes. At 5,750 peak reads/sec that's about 1.1 MB/sec outbound. Again, not a concern.
Bandwidth becomes interesting only if you serve the destination page through the shortener (you don't), or if you add analytics pixel tracking with embedded images. For a pure redirect service, bandwidth is never the constraint. Saying so in an interview shows you've thought about it and ruled it out rather than just ignoring it.
Cache sizing: how much memory do you actually need? Smaller than you'd expect ›
URL access follows a power-law distribution. A small fraction of URLs receive the vast majority of traffic. Caching the hot 20% of the working set typically yields an 80%+ cache hit rate. The question is: what does 20% actually cost in RAM?
The daily active working set is roughly 1M new URLs × 5 years × 365 days = 1.8B total URLs, but on any given day only a slice of those are active. A reasonable model: cache the top 20% of daily active URLs. If 10M URLs are active on a given day, caching 2M × 500 bytes = 1 GB of Redis memory covers the hot tail. Even at 10× that scale, you're talking 10 GB. A single Redis node comfortably holds this.
Why this matters for the design: knowing the cache fits entirely in memory on one or two nodes means you can justify a simple Redis setup rather than immediately reaching for Redis Cluster. Scale the architecture to what the numbers actually demand.
Play with the numbers
Use the estimator below to see how these figures shift at different scales: a single viral campaign versus a global platform.
The key insight these numbers reveal: this system is overwhelmingly read-heavy, storage is manageable, bandwidth is a non-issue, and the cache footprint is small. That combination, high read ratio, cheap cache, modest storage, is exactly why a simple cache-aside pattern with Redis works so well here. The architecture follows naturally from the numbers.
High-level architecture
Start with this overview diagram before diving into any component. Establish the full picture, then drill down. This is the structure you should draw on the whiteboard in the first 10 minutes.
Component breakdown
Load balancer. L7 (HTTP-aware), distributes traffic across stateless API servers. Can route /create (write path) separately from /:code (read path) for independent scaling.
API servers. Fully stateless; no session affinity required. This is the core architectural choice that enables horizontal autoscaling. Every request contains everything needed to process it.
Redis cache. The most important component for read performance. With a 20% working-set cache and a power-law click distribution, ~80% of redirects are served without touching the database.
Database. Stores URL mappings durably. MySQL works at moderate scale; Cassandra or DynamoDB for large-scale write throughput and horizontal sharding.
Analytics service. Fully asynchronous. Clicks are published to Kafka, never blocking the redirect response. A stream processor (Spark/Flink) aggregates to a columnar store.
Architectural rationale
Every component is a deliberate answer to a specific constraint. Expand each section to understand the reasoning behind the choice, which is what interviewers at L5+ are listening for.
① Load balancer Availability & independent scaling ›
Why L7 over L4? An L7 (HTTP-aware) load balancer lets you route by URL path. The write path (POST /shorten) and read path (GET /:code) have completely different traffic patterns. Reads are 100× more frequent. L7 lets you scale them independently, routing them to separate server pools, without any client changes.
Why not DNS round-robin? DNS has no health-check awareness. A failed server receives traffic until the DNS TTL expires, which is potentially minutes. A proper load balancer detects unhealthy nodes within seconds and removes them. That responsiveness is essential for a 99.99% availability target.
② API servers Statelessness is the key decision ›
Why stateless? A stateless server holds no session data. Every request carries everything needed to process it. This is the single architectural decision that makes horizontal autoscaling trivial: spin up more servers, point the load balancer at them, done. No sticky sessions, no session replication, no coordination overhead.
What state is explicitly excluded? No in-flight request state, no user sessions, no per-connection counters. The only "state" is the pre-fetched ID range in memory (counter-based encoding), which is safe to lose on restart, since the server simply fetches a new range from Zookeeper.
③ Redis cache The highest-impact component ›
Why is this the most important component? URL access follows a heavy power-law distribution, where a small fraction of URLs (viral links, popular campaigns) receives the vast majority of clicks. Caching 20% of the working set by count typically yields 80%+ hit rates. Without cache, every redirect hits the DB. With it, most complete in ~1 ms with no DB I/O whatsoever.
Why Redis over Memcached? Redis supports native TTL per key (critical for URL expiration), persistence options (RDB/AOF snapshots for warm-up after restart), and Cluster mode for horizontal sharding. Memcached is marginally faster for pure string caching but lacks all three of these operationally important features.
DEL short_code to Redis alongside the DB delete.
④ Database Durability anchor of the system ›
Why not use Redis as the primary store? Redis is an in-memory store. Data loss on failure is possible without careful persistence configuration, and even then it lags behind a relational DB in durability guarantees. The database is the source of truth; the cache is a performance optimisation layered on top of it.
MySQL vs Cassandra: when to switch? MySQL handles hundreds of millions of URL records comfortably on a single node with read replicas. Reach for Cassandra (or DynamoDB) when you need multi-region active-active writes, or when manual sharding in MySQL becomes operationally painful. For most interview scenarios, starting with MySQL + read replicas is correct, jumping straight to Cassandra invites difficult questions about tunable consistency.
⑤ Analytics service Why async is non-negotiable ›
Why not log clicks synchronously in the redirect handler? A redirect must complete in <10 ms. Even a fast synchronous DB write adds 5–15 ms on every redirect. At 1,000 redirects/sec that's significant load; at 50,000/sec after a viral link, it becomes a bottleneck that degrades the entire service for all users. Decoupling with Kafka means the redirect path never waits on analytics.
Why Kafka over a background thread writing to the DB? Kafka is a durable buffer. If the consumer falls behind during a traffic spike, events queue safely with no data loss and no backpressure on API servers. A background thread writing directly to the DB has no such buffer; if the DB slows down, the thread queue grows in the API server's memory until OOM.
INCR (fast counts, but lossy and hard to enrich with geo/referrer)
Direct DB write (simpler, but couples analytics to the redirect path)
AWS Kinesis / GCP Pub/Sub (managed Kafka alternatives)
What the interviewer is really asking: "Walk me through why you made each of these choices." Candidates who can articulate the reasoning behind a component, not just what it does, are the ones who pass at L5+.
How this compares to Bitly and TinyURL
Both are production systems at significant scale, and their known architectures validate several of the choices above.
| Decision | Our design | Bitly (known) | TinyURL (known) |
|---|---|---|---|
| Short code generation | Counter + Base62, Zookeeper ranges | Custom hash + Base58 (avoids visually ambiguous chars like 0/O) | Sequential counter, Base36 for older links |
| Primary datastore | MySQL + read replicas → Cassandra at scale | Cassandra for URL mappings (known from engineering blog) | MySQL (still relational at their scale) |
| Caching | L1 local + L2 Redis | Redis + CDN edge caching for top URLs | In-process cache + Redis |
| Redirect type | 302 default, 301 opt-in | 301 for branded links (SEO benefit for customers); 302 for tracked links | 301 by default (analytics less central to their model) |
| Analytics pipeline | Kafka → Flink → ClickHouse | Kafka-based event pipeline, proprietary analytics layer | Basic click counts; no real-time analytics product |
Bitly's choice of 301 for branded links is worth noting. It's a deliberate product decision that trades click analytics for SEO link equity passing to the destination. This is the same tradeoff our design exposes as an opt-in. There's no universally correct answer; the right choice follows from requirements.
Encoding strategies
The core algorithmic question: given a long URL, how do you generate a short, unique, collision-free code? There are two dominant approaches.
Recommended: Approach B with Zookeeper range allocation
Each API server pre-fetches a range of IDs (e.g., 1,000–2,000) from Zookeeper at startup. It burns through them locally without any network call. When the range is exhausted, it fetches the next range. This removes Zookeeper from the hot path entirely. The critical redirect path never touches it.
// Base62 encoding
const ALPHABET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
function toBase62(id) {
let result = '';
while (id > 0) {
result = ALPHABET[id % 62] + result;
id = Math.floor(id / 62);
}
return result.padStart(7, '0');
}
// ID=12345 → "0003d7", ID=3,521,614,606,208 → "zzzzzzz"
62⁷ = 3,521,614,606,208 unique codes, over 3.5 trillion. At 1 billion new URLs per day, this lasts nearly 10 years before you need an 8-character code.
One open question: what if the same URL is submitted twice?
Approach A (hash) handles this naturally. The same long URL always produces the same hash, so the same short code is returned. No extra work needed.
Approach B (counter) does not. Two submissions of https://example.com/same-url will get two different IDs and two different short codes. Whether this matters depends on your product requirements. If it doesn't (many shorteners intentionally allow multiple short codes per long URL, e.g. for tracking different campaigns), ignore it. If deduplication is needed, maintain a separate long_url → short_code lookup table with a unique index on long_url. This adds one DB read on every creation request to check the dedup table first, return the existing code if found, otherwise create a new one. At 11.5 writes/sec the overhead is trivial; at millions of writes/sec you'd shard the dedup table on a hash of the long URL.
API design
The two endpoints are simple, but the design choices within each one are worth articulating.
The two core endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/shorten | POST | Create a new short URL |
/:code | GET | Redirect to the original URL |
POST /shorten, create a short URL
Request:
// POST /shorten
{
"long_url": "https://example.com/very/long/path?with=params",
"custom_alias": "my-link", // optional
"expires_at": "2025-12-31T00:00:00Z" // optional
}
Response:
// 201 Created
{
"short_url": "https://short.ly/abc123",
"short_code": "abc123",
"long_url": "https://example.com/very/long/path?with=params",
"expires_at": "2025-12-31T00:00:00Z",
"created_at": "2025-03-28T10:00:00Z"
}
Why POST and not GET for URL creation? Idempotency and caching ›
GET requests are by definition idempotent and cacheable. Browsers, proxies, and CDNs feel free to cache them. A creation operation is neither. Submitting the same GET request twice could create two short codes (with Approach B). Using POST signals to the entire HTTP stack that this request has side effects and must not be cached or replayed automatically.
The response is 201 Created rather than 200 OK, a small detail that signals a resource was created and allows the Location header to point to the new resource if needed.
Input validation: what to check Security and correctness ›
Four things to validate on every creation request, in order of importance:
URL validity. Reject malformed URLs that would never resolve. Check for valid scheme (http:// or https://), reject bare IP addresses if you want to block direct IP linking, enforce a maximum length (URLs over ~2,000 chars cause issues in some browsers).
Custom alias availability and safety. Check against reserved words (api, health, admin, blog), profanity blocklist, and existing aliases. Return 409 Conflict if taken.
Rate limiting. Limit creation requests per API key and per IP. URL shorteners are frequently abused for spam and phishing; without rate limiting a single actor can generate millions of short codes.
Expiry sanity. If expires_at is provided, reject values in the past and values beyond a maximum retention window (e.g. 10 years).
GET /:code, redirect to the original URL
// GET /abc123
// → 302 Found
// Location: https://example.com/very/long/path?with=params
// Cache-Control: no-store (prevents browser caching for analytics)
// → 404 Not Found (code doesn't exist or has expired)
// → 410 Gone (code existed but was deliberately deleted)
410 Gone vs 404 Not Found. A small detail that signals senior thinking. A 404 means "never heard of it." A 410 means "this existed and was intentionally removed." For deleted short URLs, 410 is the semantically correct response and tells crawlers to permanently remove the URL from their index. Most implementations just use 404 everywhere. Mentioning 410 is a differentiator.
Optional endpoints worth mentioning
| Endpoint | Method | Purpose | Level |
|---|---|---|---|
/urls/:code | GET | Retrieve metadata for a short URL (owner only) | L4 |
/urls/:code | DELETE | Delete a short URL | L4 |
/urls/:code/analytics | GET | Click stats: counts, geo breakdown, referrers | L5 |
/urls | GET | List all short URLs for the authenticated user | L5 |
/bulk/shorten | POST | Create multiple short URLs in one request | L6 |
Redirect flow & 301 vs 302
The redirect is the highest-volume operation in the system. Every millisecond saved here compounds across billions of requests. The choice between 301 and 302, first flagged in the analytics NFR in Section 2, gets resolved here.
The 301 vs 302 decision: closing the loop from Section 2
In Section 2 we established analytics as a requirement and flagged it as eventually consistent. We just need the clicks to be recorded, not instantly. That single NFR resolves this choice.
| Status code | Browser behavior | Server load | Analytics |
|---|---|---|---|
301 Permanent | Caches redirect locally. Future clicks bypass your server | Minimal after first visit | ❌ Clicks not logged. Browser redirects silently |
302 Temporary | Re-requests on every click. No local caching | Every click hits your server | ✓ Full analytics. Every click is a server request |
The decision: analytics is a stated requirement, so 302 is the only valid default. Using 301 would silently discard click data after the first visit per browser. Offer 301 as an explicit opt-in per URL for users who want maximum redirect speed and have no need for click analytics. This is also the correct answer to "why not just use 301 for performance?" You'd be trading a stated requirement for a marginal latency saving on repeat visitors.
Data model
Don't reach for a schema immediately. Start by identifying the entities, then reason about the access patterns. The schema follows naturally from those two things.
Step 1, What entities do we have?
Think about the nouns in the system. A URL mapping is the core entity: a short code pointing to a long URL. A user may own that mapping (for management, analytics, custom aliases). A click event captures each redirect for analytics. That's it. Three entities, no more.
Step 2, What are the access patterns?
Each entity gets used very differently, and those differences end up shaping the schema more than anything else:
| Operation | Frequency | Query shape |
|---|---|---|
| Redirect: look up long URL by short code | Extremely high (every click) | Point lookup by short_code |
| Create: insert a new short URL | Low (100× less than reads) | Single-row insert |
| Analytics: count/group clicks per URL | Medium (dashboard queries) | Range scan by url_id + time |
| Owner lookup: list URLs by user | Low (management UI only) | Range scan by user_id |
| Expiry sweep: find expired URLs | Very low (background job) | Range scan by expires_at |
Step 3, What does this tell us about the schema?
Two things jump out immediately. First, the redirect lookup dominates every other operation by two orders of magnitude. The short_code must be indexed, and that index must stay small and fast. Second, click data has completely different access patterns from URL data: it is append-only, queried by time range, and will grow without bound. These two things should not live in the same table, and at scale they should not live in the same database.
A common early mistake is putting click counts directly on the urls row as a counter column. This creates a write hotspot. Every redirect hammers the same row with an UPDATE. Separate the concerns: keep urls append-mostly (created once, rarely updated) and handle click aggregation in the analytics pipeline.
The schema that follows from this reasoning
Why each field and index?
The schema above is a conclusion, not a given. Expand each section to understand why each column and index was chosen, and what would break without it.
urls.short_code (VARCHAR(8) UNIQUE) The most critical column
›
This column carries the most important index in the system. Every redirect, the dominant operation, is a point lookup by short_code. A unique B-tree index on it gives O(log n) lookups in the worst case, but with a hot working set in the buffer pool, it is effectively O(1) in practice.
Why VARCHAR(8) not VARCHAR(255)? Index pages are fixed-size. Narrower columns pack more entries per page, which means more of the index fits in memory. For a table that could hold hundreds of millions of rows, this difference matters. 8 characters comfortably covers a 7-character Base62 code with one spare byte.
Why store both id and short_code? The id is the internal primary key used for joins (e.g., clicks.url_id). The short_code is the external lookup key. Keeping them separate means your internal row identity is decoupled from the URL encoding scheme. If you ever change encoding strategies, you don't need to rewrite foreign keys across the database.
urls.expires_at (TIMESTAMP NULL) Nullable by design
›
Most URLs never expire, so the column is nullable. A NULL value means "no expiry". This avoids storing a sentinel value like 9999-12-31 which looks awkward and can confuse application code.
Do not index this column on the main table. Expiry sweeps are a background job running infrequently, they don't need sub-millisecond performance. Adding an index on expires_at would slow down every insert for negligible gain. If sweep performance becomes a concern, run the job against a replica, or maintain a separate lightweight table of (short_code, expires_at) that is cheap to scan.
How does this interact with the cache? When a URL is created with an expiry, the cache TTL should be set to min(24h, expires_at − now). This ensures the cache entry naturally falls away when the URL expires, without needing a proactive invalidation sweep.
urls.click_count (BIGINT) A deliberate denormalisation
›
Strictly speaking, this is redundant. You can derive the total click count with SELECT COUNT(*) FROM clicks WHERE url_id = ?. But at scale that query becomes expensive if the clicks table has billions of rows. A denormalised counter on the urls row enables instant display of a click count in the UI.
Why not update it synchronously on every redirect? This is the counter hotspot problem. If a URL receives 10,000 clicks per second, you'd have 10,000 concurrent UPDATE urls SET click_count = click_count + 1 statements fighting for a row lock. The right approach is to update it periodically via the analytics pipeline, batch-aggregate click events in Kafka, then apply a single UPDATE once per minute per URL. Eventual consistency is acceptable here; users don't need sub-second click count precision.
count_updated_at timestamp). Callers who need exact real-time counts should query the analytics store directly.
The clicks table: why it's separate Different access patterns, different store
›
clicks is append-only and grows without bound. A URL shortener with 1M new URLs/day and a 100:1 read:write ratio generates 100M click events per day (about 1,000 rows per second, continuously). Within a year that's 36 billion rows. That volume is incompatible with a traditional RDBMS table sitting alongside the operationally critical urls table.
The solution: separate the store at the infrastructure level. At L4, putting clicks in a separate table in the same DB is acceptable. At L5/L6, it belongs in a write-optimised wide-column store (Cassandra, Bigtable) or streamed directly to a columnar analytics store (ClickHouse). At L7/L8, raw click events live in S3 Parquet partitioned by date, queryable via Athena or BigQuery, with ClickHouse serving as the hot-query layer for recent data.
Why store ip_hash not ip_address? Raw IP addresses are PII under GDPR and CCPA. Storing a one-way hash of the IP (salted and rotated) lets you perform deduplication (unique visitors) without storing the personal data itself. Mention this proactively. It signals awareness of compliance requirements.
clicks into a different store means you lose foreign-key guarantees, making orphaned click records for deleted URLs possible. Handle this at the application layer: when a URL is deleted, publish a tombstone event to the analytics pipeline so downstream consumers can filter or purge the associated clicks.
Index on clicks(url_id, clicked_at) Composite ordering matters
›
The dominant analytics query is: "give me all clicks for URL X between time A and time B." A composite index (url_id, clicked_at) handles this perfectly. The leftmost column (url_id) filters down to a specific URL's rows, and the rightmost column (clicked_at) supports efficient range scans within that set.
Why not (clicked_at, url_id)? Reversing the order would make the index useful for "give me all clicks across all URLs in a time range", a much less common query. Always index in the order that matches your most frequent access pattern: filter first, range second.
At large scale, this index itself becomes enormous. When the clicks table moves to Cassandra, the partition key becomes url_id and the clustering column becomes clicked_at DESC, achieving the same access pattern but distributed across nodes automatically.
Schema decisions by level
L3/L4 Identify the three entities, list the access patterns, arrive at the schema. Explain the short_code unique index and why it is the most important one.
L5/L6 Separate clicks into a write-optimised store. Discuss sharding urls on short_code via consistent hash. Mention the ip_hash vs raw IP tradeoff.
L7/L8 Discuss the clicks archival pipeline: hot tier (ClickHouse, last 30 days) vs cold tier (S3 Parquet, partitioned by month). Raise the dual-write risk during migration and how to handle it with a feature flag and backfill job. Discuss tombstone events for deleted URLs.
Caching strategy
Sections 3 and 4 established why the cache exists. The 100:1 read ratio and the <10 ms latency target make it non-optional. This section covers the how: how many layers, what belongs in each, how eviction works, and what to do when data changes.
What gets cached
The short_code → long_url mapping. Small (~500 bytes), written once, rarely updated, and queried on every single redirect. It is close to a perfect cache candidate.
Cache hierarchy
| Layer | Location | Hit latency | Size | Eviction |
|---|---|---|---|---|
| L1 (local) | Each API server process memory | ~0.1 ms | ~1,000 entries | TTL = 1 second |
| L2 (distributed) | Redis Cluster | ~1 ms | 20% of working set | LRU + TTL = 24h |
| L3 (database) | MySQL / Cassandra | ~10 ms | Full dataset | — |
Expand the sections below for the reasoning behind each layer.
Why L1 (local) in addition to L2 (Redis)? Not redundancy: different failure modes ›
Redis handles most traffic well, but a single viral URL can saturate the network connection to the specific Redis shard holding that key, even if Redis itself is healthy. The L1 local cache on each API server handles this case: requests served from L1 never touch the network at all. With a 1-second TTL, each API server effectively rate-limits its own outbound Redis traffic to at most one request per key per second, regardless of incoming load.
The two layers also fail independently. If Redis goes down, L1 continues serving hot entries for up to 1 second, during which a circuit breaker can activate and route misses directly to the DB rather than hammering a failing Redis. Without L1, a Redis failure immediately becomes a DB flood.
LRU eviction vs TTL: why both? Different problems, different tools ›
LRU eviction handles memory pressure: when the cache fills up, the least recently used entries are dropped to make room. TTL handles correctness: entries that haven't been evicted are still expired after 24 hours, preventing perpetually stale data from accumulating for obscure URLs that were cached long ago and never requested since.
They solve different problems and you need both. LRU alone means a long-tail URL cached six months ago stays forever, even if the underlying long URL was changed. TTL alone means a popular URL that's accessed every second still gets evicted after 24 hours and causes an unnecessary DB read.
For URLs with an expiry date set by the user, override the TTL: set it to min(24h, expires_at − now). This ensures the cache entry expires no later than the URL itself, without needing a separate invalidation sweep.
Cache invalidation on write Rare but must be immediate ›
URL updates and deletions are uncommon; users rarely change a short URL once created. But when they do happen, the cache must be invalidated immediately or you risk serving a stale (or deleted) URL for up to the TTL duration. The correct approach is synchronous invalidation: on any write to the urls table, issue a DEL short_code to Redis as part of the same operation (not as a background job). If the DEL fails, the write should still succeed (the TTL will eventually clean it up), but log the failure and alert.
This is a case where cache invalidation is simpler than it looks. Because short_code is the exact cache key, there's no partial invalidation, no cache stampede, and no need for versioning. Delete the key; the next request repopulates it from the DB.
Deep-dive scalability L5+
The high-level architecture in Section 4 works well at moderate scale. This section covers the decisions that come under pressure at 10× and 100×. These are the ones you need to reason about at Senior level and above. The diagram shows the full expanded system; the sections below explain the reasoning behind each scaling choice.
Database sharding: why and on what key? Shard key choice is the whole game ›
As established in capacity estimation, the urls table at moderate scale (~900 GB) fits on a single node and doesn't need sharding for capacity. You shard when read throughput exceeds what a single primary + replica set can serve, or when you need geographic distribution for latency.
The shard key is short_code, using consistent hashing. The reasoning is direct: every redirect is a point lookup by short_code, so sharding on it ensures each request goes to exactly one shard, with no cross-shard queries and no scatter-gather aggregation. The alternative (sharding on user_id) would be wrong: it optimises for the management UI (low traffic) at the expense of the redirect path (the dominant operation).
Distributed ID generation: Zookeeper vs Snowflake Keep it off the hot path ›
A global auto-incrementing counter is the simplest ID generator but a single point of contention at scale. Two patterns eliminate that contention:
Zookeeper range allocation: each API server pre-fetches a range of IDs (e.g. 1,000–2,000) from Zookeeper at startup. It burns through them locally with no network calls. When exhausted, it fetches the next range. Zookeeper sees only one request per thousand URLs created, effectively removing it from the hot path. The risk: if a server crashes mid-range, those IDs are lost. Given the key space is 3.5 trillion, a few thousand lost IDs is irrelevant.
Snowflake IDs: a 64-bit value composed of a timestamp (41 bits), datacenter ID (5 bits), machine ID (5 bits), and per-machine sequence (12 bits). Entirely self-contained, requiring no coordination or single point of contention. The tradeoff is clock skew: if a machine's clock jumps backward, ID monotonicity breaks. Handle with a clock guard that refuses to generate IDs if the system clock regresses.
Analytics pipeline: why this sequence of components? Each stage has a specific job ›
The pipeline is Kafka → Flink/Spark → ClickHouse → S3, and each stage exists for a specific reason. Kafka is the durable buffer: it decouples the API servers (producers) from the analytics processors (consumers), absorbs traffic spikes without backpressure, and retains events for 7 days allowing replay if a consumer has a bug. Flink or Spark Streaming aggregates raw click events into per-URL, per-hour, per-country counts, which is the shape your analytics dashboard actually queries. ClickHouse stores the aggregated metrics and answers dashboard queries in milliseconds due to its columnar storage format. S3 Parquet holds the raw events, partitioned by date, as the long-term archive for ad hoc queries and backfill.
Why not write directly from Kafka to ClickHouse? You could, but the aggregation step in Flink is important: it reduces 100M raw click rows/day to a far smaller set of pre-computed aggregates, keeping ClickHouse fast and cheap. Without it, every dashboard query would scan billions of rows.
Geo-distribution: what breaks at global scale? Replication lag is the core tradeoff ›
GeoDNS routes users to the nearest regional cluster, reducing redirect latency from ~100 ms (cross-continent) to ~5–10 ms (local region). Each region gets its own Redis cache and a read replica of the DB. URL creation writes always go to the global primary. You never do active-active writes to avoid split-brain.
The failure mode to reason through: a URL is created in region A, and a user in region B clicks it 200 ms later, before the async replication has propagated the new row to region B's replica. The correct behaviour is not a 404 but a fallback: on a cache miss that also misses the local replica, the read path falls through to the global primary. This adds latency for that one request but guarantees correctness. P99 replication lag between regions is typically under 100 ms, so this fallback is rare in practice.
Failure modes & edge cases
Proactively raising these signals L5+ thinking. Don't wait for the interviewer to probe; mention them as you walk through the design.
| Scenario | Problem | Solution | Level |
|---|---|---|---|
| URL expiry | Expired URLs still served until TTL expires | Background sweep job + proactive cache invalidation on expiry. Set cache TTL = min(24h, expires_at − now). |
L4 |
| Same long URL submitted twice | Creates duplicate short codes | On hash approach: same code naturally. On counter approach: optional dedup table with long_url → short_code mapping. |
L4 |
| Viral / hot-key URL | Single Redis key overloaded | L1 local in-process cache (1s TTL) on each API server. Reduces Redis traffic by 100–1000×. | L5 |
| Custom alias conflicts | User requests an alias that's taken or reserved | Check against a blocklist (reserved words, profanity). Rate limit per user. Return 409 on conflict. | L5 |
| Malicious URLs | Short URL masking phishing / malware | Async Google Safe Browsing API check on creation. Flag URL, show warning interstitial page. Never block the write path synchronously. | L5/L6 |
| Redis cache failure | All traffic falls through to DB | Circuit breaker on the cache layer. DB can handle the fallback load with read replicas. Alert and restore quickly. | L6 |
| Analytics data loss | Kafka consumer lag / consumer crash | Kafka retention of 7 days. Consumer group offset management. Idempotent click writes (dedup on click_id). |
L7 |
| Multi-region consistency | Redirect arrives before replication completes | Fall back to global primary on regional cache miss. Accept eventual consistency on reads; never accept it on writes. | L7/L8 |
How to answer by level
The same question, "design a URL shortener", has a different correct answer depending on the level being evaluated. What changes isn't the system; it's the frame. Lower levels are evaluated on correctness and coverage. Higher levels are evaluated on judgment, tradeoff reasoning, and how independently you drive the conversation.
L3 / L4 SDE I / SDE II Can you build a working system? ›
- Arrives at the high-level architecture without being led
- Chooses Base62 encoding and explains the math
- Identifies
short_codeas the critical index - Puts Redis in front of the DB without being asked
- Resolves 301 vs 302 with a clear justification
- Produces plausible capacity numbers
- No cache layer, DB handles all reads
- Analytics written synchronously in the redirect path
- Hash collisions not addressed
- No index discussion at all
- 301 vs 302 not mentioned
- Skips capacity estimation entirely
L5 Senior SDE Do you understand the tradeoffs you made? ›
- Identifies tradeoffs in their own design unprompted
- Explains why MySQL before Cassandra, not just which
- Raises hot-key problem and the L1 cache solution
- Designs the Kafka analytics pipeline end-to-end
- Discusses sharding strategy and shard key choice
- Connects NFRs back to specific architectural choices
- L4 names components; L5 justifies them
- L4 waits to be asked about edge cases; L5 raises them
- L4 picks Cassandra because it "scales"; L5 explains when not to
- L4 describes the cache; L5 reasons about hit rates and eviction
- L4 knows 302 is correct; L5 connects it back to the analytics NFR
L6 Staff SDE Can you own this system end-to-end? ›
- Surfaces hidden requirements before the interviewer does: GDPR, multi-tenancy, abuse
- Discusses operational concerns: deploy strategy, SLOs, alerting, on-call burden
- Proposes a migration path: how you'd move from v1 to v2 safely
- Identifies organisational dependencies (who owns the analytics store?)
- Reasons about the system under failure, not just the happy path
- Drives the conversation and asks better questions than the interviewer
- L5 designs the system; L6 designs the system and its lifecycle
- L5 raises edge cases; L6 raises cross-cutting concerns across teams
- L5 answers the interviewer's questions well; L6 changes the questions
- L5 discusses operational topics when asked; L6 introduces them naturally
L7 / L8 Principal / Distinguished (Google L8) Should we be building this, and how? ›
- Challenges the requirements themselves: why 99.99%, what's the cost of each additional nine?
- Reframes the problem space: custom domains, white-label, multi-region regulatory partitioning
- Introduces long-horizon thinking: 5-year capacity, platform extensibility, API versioning
- Connects technical decisions to business outcomes explicitly
- Identifies the highest-risk assumptions and proposes how to validate them early
- Discusses build vs buy vs adopt (managed Kafka vs Kinesis vs in-house)
- L6 owns a system; L7/L8 shapes what systems get built
- L6 proposes good solutions; L7/L8 questions whether the solution space is right
- L6 thinks about SLOs; L7/L8 thinks about the business model behind the SLO
- L6 designs for current scale; L7/L8 designs for optionality at future scale
- The interviewer at this level is a peer. The conversation is collaborative, not evaluative
Classic probes: how the expected answer shifts by level
The same question has a different correct answer depending on who's being evaluated. The depth of reasoning is the signal, not the conclusion.
| Probe | L3/L4 | L5/L6 | L7/L8 |
|---|---|---|---|
| "What if the same URL is submitted twice?" | Describes the dedup table: same long URL should return the same short code | Explains why hash approach handles it naturally and counter doesn't; discusses storage cost of the dedup index on long_url |
Raises whether dedup is the right default at all. Multiple codes per URL is intentional for campaign tracking. Frames it as a product decision, not just a technical one |
| "What's the biggest bottleneck?" | The database, as it has to serve every redirect | The DB at low scale (solved by caching), then ID generation at high scale (solved by Zookeeper ranges or Snowflake) | Depends on traffic shape, explains the analysis before answering. Then reframes: at true scale the bottleneck is often organisational (who owns the on-call, which team controls the analytics store) |
| "How do you expire URLs without a full table scan?" | Background job that sweeps for expired rows; set cache TTL to min(24h, expires_at − now) |
Run the sweep against a read replica to avoid impacting the primary; maintain a lightweight (short_code, expires_at) index table to make the sweep cheap |
Questions whether expiry is enforced at the DB layer at all. At scale, soft-delete with async cleanup via a Kafka tombstone event is more operationally tractable than synchronous row deletion |
| "Walk me through a cache miss" | Cache miss → DB lookup → cache the result → return redirect | Same, but notes the stampede risk: if 10,000 requests miss simultaneously for the same cold key, all hit the DB at once. Mitigate with a per-key mutex or probabilistic early expiration | Distinguishes cold miss (key never cached) vs expired miss (key evicted) vs invalidated miss (key explicitly deleted). Each has different frequency and mitigation strategy |
How the pieces connect
Every architectural decision in this article traces back to a small number of observations established early on. The chain is worth assembling in one place:
- 1 Analytics is a stated NFR (§2) → 302 is the only valid default redirect code (§6)
- 2 <10 ms p99 latency NFR (§2) → DB alone cannot hit p99 → Redis cache is mandatory, not optional (§8)
- 3 100:1 read:write ratio (§3) → power-law access distribution → 20% cache covers 80%+ of traffic → 1–10 GB Redis is sufficient (§3, §8)
- 4 Analytics consistency can be eventual (§2) → analytics writes can be async → Kafka pipeline never blocks the redirect path (§4, §9)
- 5 Redirect is a point lookup by short_code → shard key must be short_code → consistent hash on short_code routes every request to one shard (§7, §9)
- 6 URLs need durability (§2), clicks need write throughput (§3) → different consistency requirements → different stores (§7)
- Rate Limiter System Design — atomic Redis operations, distributed race conditions, and multi-tier quota enforcement
- Web Crawler System Design — Bloom filter deduplication, politeness throttling, and distributed frontier design
- Twitter Feed System Design — fan-out write amplification, hybrid push/pull strategy, and celebrity threshold design
- Notification Service System Design — multi-channel delivery, idempotency keys, and priority queues at scale
- Key-Value Store System Design — consistent hashing, eviction policies, replication, and failure modes
- Search Autocomplete
- Chat System (WhatsApp) System Design — WebSocket architecture, message delivery guarantees, and fan-out