Why We're Building Raven? Why OpenSource?

Chaitanya Laxman
Apr 6
5 min read

Most meeting intelligence tools are browser extensions. We think that's the wrong architecture. Here's why we built Raven Pro as a native Electron desktop app, what that decision cost us in engineering effort, and what it buys the user.

The problem with meeting transcription today

Every knowledge worker sits through hours of calls per week. The information exchanged on those calls - decisions, commitments, context - evaporates within hours. People take bad notes, miss details, and spend time after meetings reconstructing what was said.

The market's response has been a wave of "meeting assistant" tools. Most follow the same pattern: a Chrome extension that hooks into your browser tab, captures audio, and sends it to a cloud API for transcription. Some join your call as a visible bot participant. A few require you to share a meeting link and record server-side.

These approaches work, sort of. But they all have the same set of limitations that we kept running into as users ourselves.

Browser extensions are sandboxed. They can only capture audio from the tab they're attached to, which means they break when you switch tabs, when the meeting platform updates its DOM, or when you're on a desktop client like Slack huddles or Discord. Bot-based tools announce their presence to everyone on the call - sometimes that's fine, often it's not. Server-side recording requires meeting links and calendar integrations, adding friction and failing for ad hoc calls.

We wanted something that just works. Open the app, join your meeting on whatever platform you want, and get real-time transcription and AI assistance without anyone else knowing.

The case for a native desktop app

Raven Pro is an Electron app that runs on macOS and Windows. It captures audio at the OS level - system audio (what the remote speaker says) and microphone audio (what you say) - as two separate streams. It doesn't care what meeting platform you're using. Zoom, Meet, Teams, Discord, a phone call routed through your computer - it's all the same to Raven. It captures the raw audio output of your system.

This is a fundamentally different approach from browser extensions, and it's harder to build. But it solves the right problems.

On macOS, we use ScreenCaptureKit for system audio capture and CoreAudio for the microphone. On Windows, we wrote a Rust module that uses WASAPI - the Windows Audio Session API - for both loopback capture (system audio) and standard input capture (mic). We chose Rust for the Windows audio layer because WASAPI's COM-based API is unforgiving with memory management, and Rust's ownership model prevents the class of bugs that would otherwise show up as intermittent audio glitches or crashes under load.

The overlay - where you see the live transcript and AI responses - is rendered as a transparent, always-on-top window that's excluded from screen sharing. On macOS, this uses NSWindow level configuration. On Windows, we use the WDA_EXCLUDEFROMCAPTURE display affinity flag. The result: you see the transcript, but when you share your screen, it's invisible to everyone else.

Echo cancellation: the hard part

Capturing two audio streams is necessary but not sufficient. The remote speaker's voice comes through your speakers and gets picked up by your microphone. Without echo cancellation, your transcription engine processes the same words twice - once from the system audio stream (correct) and once from the mic stream (echo). The transcript becomes garbled.

We solved this with GStreamer and the WebRTC AEC3 engine - the same acoustic echo canceller that runs inside Chrome. We built a native C++ addon that creates a GStreamer pipeline: system audio feeds into webrtcechoprobe as a reference signal, mic audio runs through webrtcdsp which subtracts the echo using that reference. The output is two clean, separated streams.

Getting this to work reliably across hardware configurations was the single most time-consuming engineering effort in the project. Sample rate mismatches between devices, latency alignment between the two streams, Bluetooth codec variability - each of these required weeks of tuning. The entire echo cancellation pipeline is roughly 800 lines of C++, compiled against Electron's Node.js runtime for ABI compatibility.

It's not perfect. AEC is inherently lossy. Aggressive suppression occasionally clips the edges of your sentences. But for real-time transcription purposes, it's reliable enough that you stop thinking about it, which is the goal.

Infrastructure: what runs where

Raven Pro follows a local-first philosophy. Audio capture and echo cancellation happen entirely on-device. Your API keys for transcription (Deepgram) and AI (Claude, GPT) are stored locally and sent directly from your machine to those services. We don't proxy your audio or your conversations through our servers.

The backend infrastructure we do run - deployed on AWS via Terraform - handles user authentication, session storage, and the sync layer for meeting history. The stack is ECS for compute, RDS for persistence, ALB for load balancing, and WAF for edge protection. We chose Terraform over CloudFormation because we need the same infrastructure definitions to work if we ever move to a multi-cloud setup, and because Terraform's plan/apply cycle makes infrastructure changes reviewable in the same way code changes are.

The app itself is distributed as a signed and notarized binary on macOS (we're using Apple's notarization pipeline via our developer enrollment) and as a standard installer on Windows. Auto-update is handled through Electron's built-in updater pointing at our release server.

What we're building toward

Raven Pro today does three things well: real-time transcription of both sides of a conversation, echo cancellation that cleanly separates speakers, and mid-meeting AI assistance where you can ask Claude or GPT questions with the transcript as context.

What we're building toward is a meeting operating layer. Session history with full searchable transcripts. Custom AI modes - different system prompts for sales calls versus interviews versus standups. Document context injection, where you upload reference material and the AI uses it when answering questions mid-call. All of this running locally, with your own API keys, no accounts required for the core functionality.

The honest status: the core app works. Echo cancellation is stable. Transcription quality depends on Deepgram's Nova-3 model, which is excellent for English and solid for Hindi. The rough edges are in polish - installer UX, first-run onboarding, edge cases with unusual audio routing setups. We're iterating daily.

Why open source

Raven Pro's core is open source. The echo cancellation pipeline, the audio capture modules, the Electron shell - it's all on GitHub under Laxcorp Research. Users bring their own API keys.

The reasoning is straightforward: meeting transcription is a trust-sensitive domain. People say private things on calls. If you're going to run software that listens to every conversation you have, you should be able to read the source code and verify that audio isn't being exfiltrated. Open source is the only credible answer to that concern.

The business model, when we get there, will be around premium features built on top of the open core - team sync, analytics, integrations. But the thing that listens to your microphone will always be auditable.

Raven Pro is available for macOS and Windows at useraven.ai. Source code at github.com/Laxcorp-Research/project-raven.