Skip to content

Quickstart (WebSocket)

Get real‑time transcripts in minutes using the built‑in recorder and a WebSocket‑capable provider. This guide uses Deepgram, but any WS provider works the same.

  • Start a microphone recorder (normalized PCM frames)
  • Connect a WebSocket transcription stream
  • Receive partial and final transcripts

Prerequisites

  • HTTPS (or localhost)
  • Microphone permission
  • A provider key or short‑lived token endpoint
Terminal window
# Required
pnpm add @saraudio/runtime-browser @saraudio/deepgram
# Optional stages (VAD + Meter)
pnpm add @saraudio/vad-energy @saraudio/meter

Simplest form: use a raw API key (great for quick local tries and server‑side usage).

import { deepgram } from '@saraudio/deepgram';
export const provider = deepgram({
model: 'nova-3',
auth: {
apiKey: '<DEEPGRAM_API_KEY>',
},
});

Note: For production browsers, prefer short‑lived tokens from your backend. We’ll cover this in a separate Auth guide.

Issue a short‑lived token on your server and use it via auth.getToken:

type EphemeralTokenResponse = {
access_token: string;
expires_in: number; // seconds
};
let tokenCache: { value: string; expiresAt: number } | null = null;
const nowMs = () => Date.now();
async function getToken(): Promise<string> {
if (tokenCache && tokenCache.expiresAt - nowMs() > 2000) {
return tokenCache.value;
}
const response = await fetch('/api/deepgram/token', { method: 'POST' });
if (!response.ok) {
throw new Error(`Failed to obtain Deepgram token (status ${response.status})`);
}
const body: EphemeralTokenResponse = await response.json();
const token = body.access_token;
const ttlSeconds = body.expires_in;
const safeTtlMs = Math.max(1, ttlSeconds - 2) * 1000;
tokenCache = { value: token, expiresAt: nowMs() + safeTtlMs };
return token;
}
export const provider = deepgram({
model: 'nova-3',
auth: { getToken },
});
import { createRecorder, createTranscription } from '@saraudio/runtime-browser';
import { vadEnergy } from '@saraudio/vad-energy';
import { meter } from '@saraudio/meter';
const recorder = createRecorder({
// Recommended: mono 16 kHz for low latency
format: { sampleRate: 16000, channels: 1 },
// Stages: VAD for speech events; Meter for level visualization
stages: [
vadEnergy({ thresholdDb: -50, attackMs: 80, releaseMs: 200 }),
meter(),
],
segmenter: true,
});
const ctrl = createTranscription({
provider,
recorder,
transport: 'websocket',
connection: {
ws: { silencePolicy: 'keep' }, // 'keep' | 'drop' | 'mute'
},
});
ctrl.onPartial((text) => console.log('partial:', text));
ctrl.onTranscript((r) => console.log('final:', r.text));
ctrl.onError((e) => console.error(e));
await recorder.start();
await ctrl.connect();
// Later: stop
// await ctrl.disconnect();
// await recorder.stop();
  • keep (default): send all frames (best quality, more bandwidth)
  • drop: send only during speech (based on VAD)
  • mute: keep cadence by sending zeroed frames in silence

Change at controller creation: connection.ws.silencePolicy.

  • Concepts → Controller & Transport (policies, retries)
  • Providers → Deepgram / Soniox (WS options)