📖 6 min read
Last updated on

Find Median from Data Stream — Coding Interview Walkthrough


Hero Image for Find Median from Data Stream — Coding Interview Walkthrough

You’re in a coding interview. The interviewer says: “Design a data structure that supports adding numbers and querying the median at any time.” You know the classic solution uses two heaps. But now you need to explain the invariant, implement max-heap negation in Python, handle the rebalancing logic without errors, and justify the O(log n) / O(1) complexity split. Find Median from Data Stream is LeetCode #295, a heap design problem that tests your ability to maintain an invariant across insertions.

This walkthrough covers the two-heap approach in detail, including the routing pattern that keeps both halves balanced and the Python-specific max-heap workaround.


The Problem

Implement a MedianFinder class:

  • addNum(num: int) adds a number from the data stream.
  • findMedian() -> float returns the median of all added numbers.

If the count of numbers is even, the median is the average of the two middle values. (LeetCode #295)

Example

medianFinder.addNum(1)    # stream: [1]
medianFinder.addNum(2)    # stream: [1, 2]
medianFinder.findMedian() # returns 1.5
medianFinder.addNum(3)    # stream: [1, 2, 3]
medianFinder.findMedian() # returns 2.0

Already comfortable with the solution? Practice it in a mock interview →


What Interviewers Are Actually Testing

This is a data structure design problem. The interviewer wants to see whether you can maintain a complex invariant (two balanced heaps) across dynamic operations and handle Python’s min-heap-only limitation.

SkillWhat Interviewers Observe
Two-heap designDo you use a max-heap for the lower half and min-heap for the upper half?
Balance invariantDo you rebalance after each insertion to keep sizes within 1?
Python max-heap simulationDo you negate values to simulate a max-heap with heapq?
Median formulaDo you correctly handle both even and odd count cases?
Trade-off awarenessDo you know the O(log n) insertion / O(1) median complexity?

Step 1: Clarifying Questions

  1. Can numbers be negative? Yes. This doesn’t change the algorithm, but the negation for max-heap simulation needs to work with negative values too.

  2. Will findMedian be called on an empty stream? No. At least one addNum call precedes any findMedian call.

  3. Can duplicates be added? Yes. Heaps handle duplicates naturally.


Step 2: A First (Non-Optimal) Idea

Maintain a sorted list. On each addNum, insert in sorted position using bisect.insort (O(n) due to shifting). On findMedian, return the middle element(s) in O(1).

ApproachaddNumfindMedianSpace
Sorted list (bisect.insort)O(n)O(1)O(n)
Two heapsO(log n)O(1)O(n)

The sorted list works but O(n) per insertion is too slow for large streams. Two heaps bring insertion down to O(log n).


Step 3: The Key Insight

Maintain two heaps: a max-heap lo for the lower half, and a min-heap hi for the upper half. Invariant: len(lo) >= len(hi) and every element in lo is less than or equal to every element in hi. The median is lo[0] when the total count is odd, or (lo[0] + hi[0]) / 2 when even.

The routing pattern on each insertion:

  1. Push num onto lo (max-heap).
  2. Push the max of lo onto hi (to maintain the ordering invariant).
  3. If len(hi) > len(lo), pop the min of hi back to lo.

This always routes new numbers through lo first (ensuring they get compared to the lower half) before potentially moving the maximum to hi.

Two heaps side by side: lo max-heap [3,1] and hi min-heap [5,8], with median = (3+5)/2 = 4.0. ---

Step 4: Optimal Strategy

  1. Initialize two heaps: lo (max-heap, stored as negated values) and hi (min-heap).
  2. For each addNum(num):
    • Push -num to lo.
    • Pop the max from lo (negate to get the value), push it to hi.
    • If len(hi) > len(lo), pop the min from hi, push its negation to lo.
  3. For findMedian():
    • If len(lo) > len(hi): return -lo[0].
    • If len(lo) == len(hi): return (-lo[0] + hi[0]) / 2.0.

Python Implementation

import heapq

class MedianFinder:
    def __init__(self):
        self.lo = []  # max-heap (negate values)
        self.hi = []  # min-heap

    def addNum(self, num: int) -> None:
        # Push to lo (max-heap via negation)
        heapq.heappush(self.lo, -num)
        # Balance: lo's max must be <= hi's min
        heapq.heappush(self.hi, -heapq.heappop(self.lo))
        # Keep lo size >= hi size
        if len(self.hi) > len(self.lo):
            heapq.heappush(self.lo, -heapq.heappop(self.hi))

    def findMedian(self) -> float:
        if len(self.lo) > len(self.hi):
            return -self.lo[0]
        return (-self.lo[0] + self.hi[0]) / 2.0

Why this is interview-friendly:

  • Three heap operations per addNum. Push to lo, pop-push to hi, conditional pop-push back. Each is O(log n). Clean and deterministic.
  • Negation handles max-heap. Python’s heapq is min-heap only. Storing -num in lo simulates a max-heap. Access the max via -lo[0].
  • No size tracking variables. len(lo) and len(hi) give sizes directly. No separate counter to maintain.

Time & Space Complexity

TimeSpace
addNumO(log n)O(n) total
findMedianO(1)O(1)

Each insertion does at most 3 heap operations (push, pop, push). Median is a constant-time peek at heap tops.

A common follow-up: “Can you do better?” For arbitrary streaming data, O(log n) insertion with O(1) median is optimal. With bounded integer ranges, a Fenwick tree or bucket sort could achieve O(log range) for both operations.


Common Interview Mistakes

  1. Forgetting to negate for max-heap. Python’s heapq is a min-heap. Push -num to lo and use -lo[0] to read the max. Forgetting one negation produces wrong results silently.

  2. Not rebalancing after insertion. Without rebalancing, sizes can drift and the median formula breaks. Always ensure len(lo) >= len(hi) and len(lo) - len(hi) <= 1.

  3. Sorting on each findMedian call. O(n log n) per query defeats the purpose. The two-heap approach reduces findMedian to O(1).

  4. Off-by-one in the median formula. When sizes are equal (even count): average the two tops. When lo has one more (odd count): return lo’s top. Getting this backwards returns the wrong value.

  5. Routing directly to hi first. The pattern must route through lo first to maintain the ordering invariant. Routing to hi first can violate the property that all elements in lo are less than or equal to all elements in hi.


What a Strong Candidate Sounds Like

“I’ll use two heaps: a max-heap for the lower half and a min-heap for the upper half. After each insertion, I route the element through the max-heap and push its max to the min-heap to preserve the ordering invariant, then rebalance sizes. The median is the top of the larger heap (odd count) or the average of both tops (even count). addNum is O(log n), findMedian is O(1).”


Example Interview Dialogue

Interviewer: Walk through adding 5, 2, 8, 1.

Candidate: Add 5: lo=[-5], hi=[]. Median: 5. Add 2: push -2 to lo=[-5,-2], pop 5 to hi=[5]. lo=[-2], hi=[5]. len(hi) <= len(lo)? No, equal, no rebalance. Median: (-(-2) + 5)/2 = 3.5. Add 8: push -8 to lo=[-8,-2], pop 8 to hi=[5,8]. len(hi)=2 > len(lo)=1, rebalance: pop 5 from hi, push -5 to lo=[-5,-2]. lo=[-5,-2], hi=[8]. Median: 5. Add 1: push -1 to lo=[-5,-2,-1], pop 5 to hi=[5,8]. lo=[-2,-1], hi=[5,8]. Equal sizes, median: (2+5)/2 = 3.5.

Interviewer: What about very large streams?

Candidate: The two-heap approach is already optimal for arbitrary streams: O(log n) per insertion, O(1) per median, O(n) space. For bounded integer values, a counting array or Fenwick tree could reduce insertion to O(1) or O(log range).



Practice This in a Mock Interview

Find Median from Data Stream is a canonical heap design problem. The two-heap invariant is the whole key. Understanding it well enough to explain why the route-through-lo-then-rebalance pattern always works, and implementing it cleanly with Python’s negation trick, is what separates a complete answer from a partial one.

Start a mock interview for Find Median from Data Stream on Intervu.

Further reading:

Practice Like It's the Real Interview

Get instant feedback on your approach, communication, and code — powered by AI.

Start a Mock Interview →