This started with a WhatsApp message from a doctor friend: "Vik, I spend four hours a day on paperwork. Four hours. I became a doctor to help people, not to type."
So naturally, I decided to build an AI that could handle the documentation while doctors actually, you know, doctor.
The Problem
Here's the dirty secret of modern medicine: doctors spend more time documenting than treating. After every consultation, they need to write up symptoms, potential diagnoses, treatment plans, follow-ups. All in professional medical language, all compliant with record-keeping standards.
My doctor friend described his typical day: see a patient for 15 minutes, spend 20 minutes writing notes about it. See another patient. More notes. By the time he's done, he's exhausted and hasn't even started on the complex cases.
The existing solutions? Either prohibitively expensive enterprise software, or voice-to-text tools that produce word salad requiring heavy editing. Neither actually solved the problem.
The Stack
Building this thing required more moving parts than I initially anticipated. The core challenge: real-time transcription with speaker identification, AI that understands medical context, and a workflow doctors could actually use during consultations.
Frontend: Next.js 15 with React 19. Went with the App Router because I apparently enjoy pain. TailwindCSS and Shadcn UI for the interface. The whole thing needs to work on tablets since doctors don't sit at desks during consultations.
Backend: Firebase everything. Firestore for patient records and consultations, Firebase Auth for the access control, Firebase Functions for email automation. Firebase App Hosting for deployment because I've learned my lesson about DIY deployment nightmares.
The AI Bits: Deepgram Nova-3 handles the real-time transcription. It's genuinely impressive. Streams audio over WebSocket, returns transcribed text with speaker diarization (automatically figures out who's the doctor and who's the patient). Gemini 2.5 handles the medical analysis and report generation.
Email System: Resend with React Email templates. Sounds simple until you need webhook signature verification, retry logic with exponential backoff, and rate limit handling. Nothing is ever simple.
The Hard Parts
Every project has that moment where you question all your life choices. Mediate had several.
WebSocket Reliability: The initial production deployment was a disaster. Transcription worked perfectly in development, then failed mysteriously in production. Turns out, routing WebSocket connections through Firebase Functions added latency and created weird race conditions.
The fix: unified WebSocket architecture that connects directly to Deepgram in both environments. Environment-aware API key handling (client-side for dev, server-provided for production). Sounds obvious in retrospect. Took me three weeks to figure out.
Report Generation Speed: The first version took 10-15 seconds to generate a report. Doctors would finish a consultation, click "Generate Report," then sit there watching a spinner. Terrible UX.
I optimised the prompts, switched to Gemini Flash for the initial analysis (saving Pro for the final report), and reduced unnecessary context. Got it down to under 2 seconds. A 95% improvement that took embarrassingly long to achieve.
The TypeScript Nightmare: At one point, I had 210 linting errors blocking deployment. Two hundred and ten. The codebase had grown organically (read: messily), and TypeScript was not having it.
I spent a week doing nothing but creating proper interface definitions. ReviewedConsultationData with nested types for symptoms, diagnoses, questions, and follow-ups. Probably should have done that from the start. Hindsight is cruel.
Patient Data Flow: Complex consultation workflows caused patient information to mysteriously vanish between steps. Data would disappear between patient selection and the final report. A ghost in the machine.
The solution was comprehensive data preservation with explicit passing between components. Created a consultation context that maintains all patient data throughout the workflow. Less elegant than I'd like, but it works.
What I Learned
Building for a regulated industry is different. Medical software has compliance requirements, legal implications, and real consequences if something goes wrong. Every AI suggestion needs doctor approval before it goes anywhere. No shortcuts.
Real-time systems are hard. The gap between "working demo" and "production-ready" is enormous. Latency matters. Error handling matters. Reconnection logic matters. All the boring stuff becomes critical.
Also: doctors are surprisingly patient beta testers. My friend and his colleagues have been using the beta, reporting bugs, suggesting features, and generally being more forgiving than I deserve. Healthcare professionals deal with actual life-or-death situations daily; a occasionally buggy app doesn't faze them.
Current Status
Mediate is in beta at mediate-app.com. Version 2.9.1-beta as of writing. A handful of doctors are using it for real consultations, which is both exciting and mildly terrifying.
The core workflow is solid: record consultation, get real-time transcription with speaker identification, review AI-extracted symptoms and diagnoses, approve or edit, generate professional report. Doctors can export to PDF or plain text for their EHR systems.
Still on the roadmap: organisation support for multi-doctor practices, email delivery of reports, and integrations with existing medical record systems. The unglamorous stuff that makes software actually useful in the real world.
My friend no longer spends four hours on paperwork. He's down to about 45 minutes. That's three hours back in his day for seeing patients, or occasionally, having lunch. Progress.