Skip to content

Quickstart (HTTP)

HTTP is great for simple, cost‑efficient transcription. The live aggregator batches PCM frames into WAV and flushes by timer or at the end of a speech segment.

What you’ll build

Start the recorder and gate audio by speech
Send HTTP chunks by timer or on segment end
Receive final transcripts

Tip For “one request per phrase”, set intervalMs = 0 and flushOnSegmentEnd = true.

1) Install

# Required
pnpm add @saraudio/runtime-browser @saraudio/deepgram

# Optional stages (VAD + Meter)
pnpm add @saraudio/vad-energy @saraudio/meter

2) Provider (Deepgram example)

Use a raw API key for the quickest setup (ideal for server or local tests).

import { deepgram } from '@saraudio/deepgram';

export const provider = deepgram({
  model: 'nova-3',
  auth: {
    apiKey: '<DEEPGRAM_API_KEY>',
  },
});

Note: For production browsers, prefer issuing short‑lived tokens from your backend. We’ll cover this in the Auth guide.

3) Controller in HTTP mode

import { createRecorder, createTranscription } from '@saraudio/runtime-browser';
import { vadEnergy } from '@saraudio/vad-energy';
import { meter } from '@saraudio/meter';

const recorder = createRecorder({
  stages: [vadEnergy({ thresholdDb: -50 }), meter()],
  segmenter: true,
});

const ctrl = createTranscription({
  provider,
  recorder,
  transport: 'http',
  flushOnSegmentEnd: true, // enable segment‑only semantics
  connection: {
    http: {
      chunking: {
        intervalMs: 0, // no timer → flush on segment end (or forceEndpoint)
        minDurationMs: 700, // ignore tiny bursts on timers (not used when interval=0)
        overlapMs: 500, // tail continuity between chunks
        maxInFlight: 1, // limit concurrent HTTP requests
        timeoutMs: 10000, // per‑flush timeout
      },
    },
  },
});

4) Start and flush

ctrl.onUpdate((u) => {
  const text = u.tokens.map((t) => t.text).join('').trim();
  if (text) console.log('final:', text);
});
ctrl.onError((e) => console.error(e));

await recorder.start();
await ctrl.connect();

// Optional: manual flush at any time (e.g., on button click)
// await ctrl.forceEndpoint();

How VAD gating works

With flushOnSegmentEnd: true, the controller subscribes to speech frames only → silence is dropped.
If you disable VAD or don’t end a segment, call forceEndpoint() to flush in‑flight audio.

Next

Concepts → Controller & Transport (HTTP chunking in detail)
Providers → Deepgram / Soniox (HTTP support and models)