How I Built an AI That Conducts Mock Coding Interviews


I wanted to practice for coding interviews. Not solve more LeetCode — I’d done plenty — but practice the actual interview: talking through my approach, fielding follow-up questions, coding under live pressure while someone watches.

The options were limited. Peer mock interviews are hard to schedule and inconsistent. Paid coaching is expensive. Recording yourself talking to a wall doesn’t give you feedback. So I built my own interviewer — an AI that listens, responds, probes, and evaluates. That became intervu.dev.

This post is about how it works, the tech decisions I made, and the things that turned out to be harder than expected.


The idea

A realistic mock coding interview has a few moving parts happening simultaneously:

  • The interviewer presents a problem and asks clarifying questions
  • The candidate talks through their approach out loud
  • There’s a code editor where the candidate writes and runs code
  • The interviewer reacts to what the candidate says and writes — nudging, hinting, or pushing back
  • At the end, the candidate gets structured feedback

I wanted all of that in a browser, with no install, no scheduling, and no other human required. The AI had to feel like a real interviewer — not a chatbot that says “great job!” regardless of what you do.


The stack

I kept the stack boring on purpose. The interesting problems are in the interaction design, not in the framework choices.

  • Frontend: Next.js (React), Monaco Editor for the code panel, Framer Motion for small UI touches
  • Backend: Python / FastAPI, with WebSocket endpoints for real-time audio streaming
  • Database: PostgreSQL — sessions, transcripts, scores, problems
  • Code execution: Docker sandboxes. The candidate’s code runs in an isolated container per language (Python, JavaScript, Java, C++, Go, TypeScript)
  • AI: Multi-provider LLM setup — OpenAI, Google, Groq — with different models assigned to different roles (interviewer, judge, summarizer)
  • Voice: Speech-to-text via Deepgram (streaming over WebSocket), text-to-speech via multiple providers (OpenAI, Google, Cartesia)
  • Auth: Clerk for registered users, plus a guest mode with localStorage-based identity
  • Infra: Docker Compose for dev, single-machine deploy for prod

Everything runs in Docker locally. docker compose up starts the backend, frontend, database, and the code runner base image. That was important — I wanted anyone (including future me) to be able to spin up the whole system in one command.


How the interview loop works

The core interaction is a turn-based conversation layered on top of a state machine. Here’s the basic flow:

  1. Candidate speaks → mic captures audio → streamed to the backend over WebSocket → transcribed in real-time
  2. Backend processes the turn → the transcript plus conversation history is sent to the LLM interviewer, along with the current code in the editor
  3. AI responds → the response is streamed as text to the chat panel and simultaneously sent to TTS, which streams audio chunks back to the browser
  4. Candidate’s turn again → mic reactivates after TTS finishes

The state machine tracks where you are in the interview: clarification phase, design phase, coding phase, testing phase. The LLM prompt changes based on the phase — early on it probes for understanding; later it focuses on code quality and edge cases.

The tricky part isn’t any single step — it’s the orchestration. Audio playback has to finish before the mic goes hot. The code editor’s content has to be captured at the right moment. The LLM needs the full conversation context but shouldn’t see stale code. Getting the timing right across all of these took more iteration than writing any individual component.


Voice: the hardest part

Text chat works fine for demos. But real interviews are verbal. You think out loud. You get interrupted. The interviewer’s tone matters. So voice had to work well — not just technically, but naturally.

A few things I had to solve:

Turn-taking

When do you stop listening and let the AI respond? Too aggressive and you cut the candidate off mid-sentence. Too passive and there are awkward silences. I use a combination of silence detection (configurable threshold) and the STT provider’s endpoint detection.

A huge win here was switching to Deepgram’s new Flux models. Their end-of-turn detection is incredibly accurate — it understands the difference between a natural pause mid-thought (“So I think the time complexity is… O(N)”) and the actual end of an answer. It dramatically reduced the number of times the interviewer accidentally interrupted the candidate, making the whole interaction feel much more human.


Code execution in the browser

The candidate writes code in a Monaco editor (same engine as VS Code). When they hit “Run”, the code is sent to the backend, which spins up a Docker container for the selected language, runs the code with a timeout, and returns stdout/stderr.

Each language has its own Docker image with the runtime pre-installed. The sandbox is ephemeral — created per run, destroyed after. No persistent state, no network access from inside the container. This keeps things safe even though users are running arbitrary code.

The backend also manages unit tests. When a problem is loaded, it comes with test cases. The candidate can run their code against the tests at any point, and the results show up inline — which tests passed, which failed, what the expected vs. actual output was.

Magic problem import

The coolest feature, though, is the custom importer. If you have a specific LeetCode problem you want to practice, you can just paste the URL into intervu.dev. The system scrapes the problem description, uses an LLM to generate test cases and scaffolding, and sets everything up automatically. Wait 30 seconds, and you’re in a full mock interview for that exact problem. It’s incredibly powerful for targeted practice.


The AI interviewer prompt

The LLM prompt is where most of the “personality” lives. A few design decisions:

Socratic, not didactic. The interviewer never gives you the answer. It asks leading questions. “What’s the time complexity of that approach?” or “What happens if the input is empty?” — not “You should use a hashmap.”

Phase-aware. The prompt knows what phase of the interview you’re in. During clarification, it rewards questions. During coding, it watches for bugs but bites its tongue unless asked. During testing, it prompts for edge cases.

Signal-based evaluation. At the end, a separate “judge” model evaluates the session across multiple dimensions: problem understanding, approach quality, communication, code quality, testing. Each gets a score and written feedback. This isn’t just a single thumbs-up/thumbs-down — it’s modeled after real FAANG rubrics.

Getting the prompt right was (and still is) an ongoing process. Too strict and the AI feels like a drill sergeant. Too lenient and it says “great!” when you write buggy code. The balance is hard, and I’ve rewritten the core prompt maybe 20 times.


A few things I learned

The “boring” parts take the most time. Auth, session management, error handling, deployment scripts, database migrations — none of this is exciting, but it’s probably 60% of the total effort.

Voice changes everything. A text-only version of this could be built in a weekend. Adding voice (STT + TTS + turn-taking + latency optimization) tripled the complexity and made the product 10x better. If your AI app can benefit from voice, it’s worth the pain.

LLM behavior is hard to test. You can’t unit-test “does the AI give good hints?” I ended up building a simulation mode where a separate “candidate” LLM plays the role of the interviewee, and I review the transcripts manually. It’s slow, but it catches prompt regressions before users do.

Docker-in-Docker is annoying but necessary. The backend runs in Docker and spawns runner containers for code execution. That means mounting the Docker socket. It works, but it adds deployment complexity and makes local development trickier than I’d like.


In short

I built an AI mock coding interviewer because I needed one and it didn’t exist. The tech is straightforward — Next.js, FastAPI, Postgres, Docker, WebSockets, LLMs — but the interaction design (voice timing, state management, prompt engineering, code sandboxing) is where the real complexity lives. If you’re preparing for coding interviews, the gap between “can solve the problem” and “can perform well in the interview” is real, and closing it requires practice under realistic conditions. That’s what this tool is for.