Soniox

Soniox provides two distinct paths:

WebSocket realtime (stt-rt-v3) — partials and finals with low latency.
HTTP batch via Files API (stt-async-v3) — upload → create job → poll → transcript.

Use raw model IDs. The same provider instance exposes both methods; the controller chooses the transport.

Install

# Required
pnpm add @saraudio/soniox

# Optional stages (VAD + Meter)
pnpm add @saraudio/vad-energy @saraudio/meter

Create a provider

import { soniox } from '@saraudio/soniox';

export const provider = soniox({
  model: 'stt-rt-v3', // or 'stt-async-v3' when targeting HTTP batch
  auth: { apiKey: '<SONIOX_API_KEY>' },
});

Tip: You can use the same instance for both transports; set transport on the controller per session.

Authentication

Soniox has different rules for WS vs REST. Do not expose a permanent API key in the browser.

WebSocket (realtime): issue a short‑lived temporary API key on your server using POST /v1/auth/temporary-api-key, then pass it in the first WS message as api_key. This is the recommended browser pattern for secure realtime streaming.
REST (Files/Transcriptions): requires a permanent project API key in Authorization: Bearer <key>. Temporary keys are not valid for REST and will return 401 unauthenticated.

How SARAUDIO maps this:

WS sends credentials inside the init JSON (api_key).
REST sets the Authorization header. Auth priority is getToken → token → apiKey. In browsers prefer getToken that returns the correct credential for the chosen transport.

Recommended patterns:

Realtime (browser): server endpoint returns a temporary API key; client uses auth.getToken, transport 'websocket'.
Batch REST: call Soniox from your server (or via server proxy routes) with a permanent key; avoid calling /v1/files and /v1/transcriptions directly from the browser.

If you use a temporary key against REST, Soniox will respond with 401 unauthenticated — this is expected.

WebSocket quickstart (realtime)

import { createRecorder, createTranscription } from '@saraudio/runtime-browser';
import { soniox } from '@saraudio/soniox';
import { vadEnergy } from '@saraudio/vad-energy';
import { meter } from '@saraudio/meter';

const provider = soniox({ model: 'stt-rt-v3', auth: { apiKey: '<KEY>' } });

const recorder = createRecorder({
  format: { sampleRate: 16000, channels: 1 },
  stages: [vadEnergy({ thresholdDb: -50 }), meter()],
  segmenter: true,
});

const ctrl = createTranscription({
  provider,
  recorder,
  transport: 'websocket',
  connection: { ws: { silencePolicy: 'keep' } },
});

ctrl.onPartial((t) => console.log('partial:', t));
ctrl.onTranscript((r) => console.log('final:', r.text));

await recorder.start();
await ctrl.connect();

Notes

Soniox tokens stream as “tokens”; partials are coalesced into text for you.
Use silencePolicy: 'drop' to send frames only during speech.

HTTP quickstart (Files API batch)

The controller’s HTTP path calls provider.transcribe() for each chunk. The Soniox provider maps this to Files API:

upload → create transcription job → poll → fetch transcript.

import { createRecorder, createTranscription } from '@saraudio/runtime-browser';
import { soniox } from '@saraudio/soniox';
import { vadEnergy } from '@saraudio/vad-energy';
import { meter } from '@saraudio/meter';

const provider = soniox({ model: 'stt-async-v3', auth: { apiKey: '<KEY>' } });

const recorder = createRecorder({ stages: [vadEnergy({ thresholdDb: -50 }), meter()], segmenter: true });

const ctrl = createTranscription({
  provider,
  recorder,
  transport: 'http',
  flushOnSegmentEnd: true, // pair with intervalMs: 0 for one request per phrase
  connection: {
    http: { chunking: { intervalMs: 0, overlapMs: 500, maxInFlight: 1, timeoutMs: 30_000 } },
  },
});

ctrl.onTranscript((r) => console.log('final:', r.text));

await recorder.start();
await ctrl.connect();

Notes

Use the async model (stt-async-v3) for HTTP batch. Realtime model (stt-rt-v3) is for WebSocket.
Batch jobs incur additional latency (upload + processing). For live UX prefer WS.

Options (Soniox)

model: 'stt-rt-v3' | 'stt-async-v3' — realtime vs async (batch REST).
sampleRate?: number — preferred sample rate; default 16000.
channels?: 1 | 2 — channel count; default 1.
audioFormat?: 'pcm_s16le' | 'auto' | string — initial config for WS; default pcm_s16le.
languageHints?: string[] — optional list like ['en','es'].
queueBudgetMs?: number — drop‑oldest send queue budget for WS; default 200ms (clamped [100..500]).

Common provider options

auth: { apiKey?; token?; getToken? }
baseUrl: string | builder per transport
headers, query, wsProtocols

Errors & retries

401/403 → AuthenticationError
429 → RateLimitError (uses Retry‑After when present)
Other HTTP errors → ProviderError
WS close with error JSON (error_code/error_message) is mapped to the proper error type.
Controller: WS retry with backoff; HTTP flush timeout per request.

ctrl.onError((e) => {
  if (e.name === 'RateLimitError') console.warn('rate limited');
  else console.error('soniox error', e);
});

Models

import { SONIOX_REALTIME_MODELS, SONIOX_ASYNC_MODELS } from '@saraudio/soniox';

// ['stt-rt-v3'] and ['stt-async-v3']
console.log(SONIOX_REALTIME_MODELS, SONIOX_ASYNC_MODELS);

Pick stt-rt-v3 for WS realtime or stt-async-v3 for HTTP batch.

Practical tips

Prefer WS for live captions/partials; use HTTP for async jobs or phrase‑based UX with segment‑only.
Keep mono/16 kHz for low latency; the hook negotiates formats with the provider.
For big files prefer server‑side batch pipelines (upload → job → webhook/poll → storage).