Find Median from Data Stream — Coding Interview Walkthrough
You’re in a coding interview. The interviewer says: “Design a data structure that supports adding numbers and querying the median at any time.” You know the classic solution uses two heaps. But now you need to explain the invariant, implement max-heap negation in Python, handle the rebalancing logic without errors, and justify the O(log n) / O(1) complexity split. Find Median from Data Stream is LeetCode #295, a heap design problem that tests your ability to maintain an invariant across insertions.
This walkthrough covers the two-heap approach in detail, including the routing pattern that keeps both halves balanced and the Python-specific max-heap workaround.
The Problem
Implement a MedianFinder class:
addNum(num: int)adds a number from the data stream.findMedian() -> floatreturns the median of all added numbers.
If the count of numbers is even, the median is the average of the two middle values. (LeetCode #295)
Example
medianFinder.addNum(1) # stream: [1]
medianFinder.addNum(2) # stream: [1, 2]
medianFinder.findMedian() # returns 1.5
medianFinder.addNum(3) # stream: [1, 2, 3]
medianFinder.findMedian() # returns 2.0
Already comfortable with the solution? Practice it in a mock interview →
What Interviewers Are Actually Testing
This is a data structure design problem. The interviewer wants to see whether you can maintain a complex invariant (two balanced heaps) across dynamic operations and handle Python’s min-heap-only limitation.
| Skill | What Interviewers Observe |
|---|---|
| Two-heap design | Do you use a max-heap for the lower half and min-heap for the upper half? |
| Balance invariant | Do you rebalance after each insertion to keep sizes within 1? |
| Python max-heap simulation | Do you negate values to simulate a max-heap with heapq? |
| Median formula | Do you correctly handle both even and odd count cases? |
| Trade-off awareness | Do you know the O(log n) insertion / O(1) median complexity? |
Step 1: Clarifying Questions
-
Can numbers be negative? Yes. This doesn’t change the algorithm, but the negation for max-heap simulation needs to work with negative values too.
-
Will
findMedianbe called on an empty stream? No. At least oneaddNumcall precedes anyfindMediancall. -
Can duplicates be added? Yes. Heaps handle duplicates naturally.
Step 2: A First (Non-Optimal) Idea
Maintain a sorted list. On each addNum, insert in sorted position using bisect.insort (O(n) due to shifting). On findMedian, return the middle element(s) in O(1).
| Approach | addNum | findMedian | Space |
|---|---|---|---|
| Sorted list (bisect.insort) | O(n) | O(1) | O(n) |
| Two heaps | O(log n) | O(1) | O(n) |
The sorted list works but O(n) per insertion is too slow for large streams. Two heaps bring insertion down to O(log n).
Step 3: The Key Insight
Maintain two heaps: a max-heap
lofor the lower half, and a min-heaphifor the upper half. Invariant:len(lo) >= len(hi)and every element inlois less than or equal to every element inhi. The median islo[0]when the total count is odd, or(lo[0] + hi[0]) / 2when even.
The routing pattern on each insertion:
- Push
numontolo(max-heap). - Push the max of
loontohi(to maintain the ordering invariant). - If
len(hi) > len(lo), pop the min ofhiback tolo.
This always routes new numbers through lo first (ensuring they get compared to the lower half) before potentially moving the maximum to hi.
---
Step 4: Optimal Strategy
- Initialize two heaps:
lo(max-heap, stored as negated values) andhi(min-heap). - For each
addNum(num):- Push
-numtolo. - Pop the max from
lo(negate to get the value), push it tohi. - If
len(hi) > len(lo), pop the min fromhi, push its negation tolo.
- Push
- For
findMedian():- If
len(lo) > len(hi): return-lo[0]. - If
len(lo) == len(hi): return(-lo[0] + hi[0]) / 2.0.
- If
Python Implementation
import heapq
class MedianFinder:
def __init__(self):
self.lo = [] # max-heap (negate values)
self.hi = [] # min-heap
def addNum(self, num: int) -> None:
# Push to lo (max-heap via negation)
heapq.heappush(self.lo, -num)
# Balance: lo's max must be <= hi's min
heapq.heappush(self.hi, -heapq.heappop(self.lo))
# Keep lo size >= hi size
if len(self.hi) > len(self.lo):
heapq.heappush(self.lo, -heapq.heappop(self.hi))
def findMedian(self) -> float:
if len(self.lo) > len(self.hi):
return -self.lo[0]
return (-self.lo[0] + self.hi[0]) / 2.0
Why this is interview-friendly:
- Three heap operations per
addNum. Push to lo, pop-push to hi, conditional pop-push back. Each is O(log n). Clean and deterministic. - Negation handles max-heap. Python’s
heapqis min-heap only. Storing-numinlosimulates a max-heap. Access the max via-lo[0]. - No size tracking variables.
len(lo)andlen(hi)give sizes directly. No separate counter to maintain.
Time & Space Complexity
| Time | Space | |
|---|---|---|
addNum | O(log n) | O(n) total |
findMedian | O(1) | O(1) |
Each insertion does at most 3 heap operations (push, pop, push). Median is a constant-time peek at heap tops.
A common follow-up: “Can you do better?” For arbitrary streaming data, O(log n) insertion with O(1) median is optimal. With bounded integer ranges, a Fenwick tree or bucket sort could achieve O(log range) for both operations.
Common Interview Mistakes
-
Forgetting to negate for max-heap. Python’s
heapqis a min-heap. Push-numtoloand use-lo[0]to read the max. Forgetting one negation produces wrong results silently. -
Not rebalancing after insertion. Without rebalancing, sizes can drift and the median formula breaks. Always ensure
len(lo) >= len(hi)andlen(lo) - len(hi) <= 1. -
Sorting on each
findMediancall. O(n log n) per query defeats the purpose. The two-heap approach reducesfindMedianto O(1). -
Off-by-one in the median formula. When sizes are equal (even count): average the two tops. When
lohas one more (odd count): returnlo’s top. Getting this backwards returns the wrong value. -
Routing directly to
hifirst. The pattern must route throughlofirst to maintain the ordering invariant. Routing tohifirst can violate the property that all elements inloare less than or equal to all elements inhi.
What a Strong Candidate Sounds Like
“I’ll use two heaps: a max-heap for the lower half and a min-heap for the upper half. After each insertion, I route the element through the max-heap and push its max to the min-heap to preserve the ordering invariant, then rebalance sizes. The median is the top of the larger heap (odd count) or the average of both tops (even count).
addNumis O(log n),findMedianis O(1).”
Example Interview Dialogue
Interviewer: Walk through adding 5, 2, 8, 1.
Candidate: Add 5: lo=[-5], hi=[]. Median: 5. Add 2: push -2 to lo=[-5,-2], pop 5 to hi=[5]. lo=[-2], hi=[5]. len(hi) <= len(lo)? No, equal, no rebalance. Median: (-(-2) + 5)/2 = 3.5. Add 8: push -8 to lo=[-8,-2], pop 8 to hi=[5,8]. len(hi)=2 > len(lo)=1, rebalance: pop 5 from hi, push -5 to lo=[-5,-2]. lo=[-5,-2], hi=[8]. Median: 5. Add 1: push -1 to lo=[-5,-2,-1], pop 5 to hi=[5,8]. lo=[-2,-1], hi=[5,8]. Equal sizes, median: (2+5)/2 = 3.5.
Interviewer: What about very large streams?
Candidate: The two-heap approach is already optimal for arbitrary streams: O(log n) per insertion, O(1) per median, O(n) space. For bounded integer values, a counting array or Fenwick tree could reduce insertion to O(1) or O(log range).
Related Problems
- Sliding Window Median (LeetCode #480). Median in a moving window, a harder variant requiring lazy deletion from heaps.
- Kth Largest Element in a Stream (LeetCode #703). Single min-heap to track kth largest.
- K Closest Points to Origin (LeetCode #973). Heap-based selection of k elements.
Practice This in a Mock Interview
Find Median from Data Stream is a canonical heap design problem. The two-heap invariant is the whole key. Understanding it well enough to explain why the route-through-lo-then-rebalance pattern always works, and implementing it cleanly with Python’s negation trick, is what separates a complete answer from a partial one.
Start a mock interview for Find Median from Data Stream on Intervu.
Further reading:
- How to Prepare for a Coding Interview in 2026, the complete roadmap and study plans
- Why LeetCode alone isn’t enough, and what to practice instead
- Practice any LeetCode problem as a live mock interview
- The Grind 75 Pathway, a structured study plan with AI practice links
- Browse all walkthroughs on GitHub, condensed solutions with code