Skip to content

Soniox

Soniox provides two distinct paths:

  • WebSocket realtime (stt-rt-v3) — partials and finals with low latency.
  • HTTP batch via Files API (stt-async-v3) — upload → create job → poll → transcript.

Use raw model IDs. The same provider instance exposes both methods; the controller chooses the transport.

Terminal window
# Required
pnpm add @saraudio/soniox
# Optional stages (VAD + Meter)
pnpm add @saraudio/vad-energy @saraudio/meter
import { soniox } from '@saraudio/soniox';
export const provider = soniox({
model: 'stt-rt-v3', // or 'stt-async-v3' when targeting HTTP batch
auth: { apiKey: '<SONIOX_API_KEY>' },
});

Tip: You can use the same instance for both transports; set transport on the controller per session.


Soniox has different rules for WS vs REST. Do not expose a permanent API key in the browser.

  • WebSocket (realtime): issue a short‑lived temporary API key on your server using POST /v1/auth/temporary-api-key, then pass it in the first WS message as api_key. This is the recommended browser pattern for secure realtime streaming.
  • REST (Files/Transcriptions): requires a permanent project API key in Authorization: Bearer <key>. Temporary keys are not valid for REST and will return 401 unauthenticated.

How SARAUDIO maps this:

  • WS sends credentials inside the init JSON (api_key).
  • REST sets the Authorization header. Auth priority is getToken → token → apiKey. In browsers prefer getToken that returns the correct credential for the chosen transport.

Recommended patterns:

  • Realtime (browser): server endpoint returns a temporary API key; client uses auth.getToken, transport 'websocket'.
  • Batch REST: call Soniox from your server (or via server proxy routes) with a permanent key; avoid calling /v1/files and /v1/transcriptions directly from the browser.

If you use a temporary key against REST, Soniox will respond with 401 unauthenticated — this is expected.


import { createRecorder, createTranscription } from '@saraudio/runtime-browser';
import { soniox } from '@saraudio/soniox';
import { vadEnergy } from '@saraudio/vad-energy';
import { meter } from '@saraudio/meter';
const provider = soniox({ model: 'stt-rt-v3', auth: { apiKey: '<KEY>' } });
const recorder = createRecorder({
format: { sampleRate: 16000, channels: 1 },
stages: [vadEnergy({ thresholdDb: -50 }), meter()],
segmenter: true,
});
const ctrl = createTranscription({
provider,
recorder,
transport: 'websocket',
connection: { ws: { silencePolicy: 'keep' } },
});
ctrl.onPartial((t) => console.log('partial:', t));
ctrl.onTranscript((r) => console.log('final:', r.text));
await recorder.start();
await ctrl.connect();

Notes

  • Soniox tokens stream as “tokens”; partials are coalesced into text for you.
  • Use silencePolicy: 'drop' to send frames only during speech.

The controller’s HTTP path calls provider.transcribe() for each chunk. The Soniox provider maps this to Files API:

uploadcreate transcription jobpollfetch transcript.

import { createRecorder, createTranscription } from '@saraudio/runtime-browser';
import { soniox } from '@saraudio/soniox';
import { vadEnergy } from '@saraudio/vad-energy';
import { meter } from '@saraudio/meter';
const provider = soniox({ model: 'stt-async-v3', auth: { apiKey: '<KEY>' } });
const recorder = createRecorder({ stages: [vadEnergy({ thresholdDb: -50 }), meter()], segmenter: true });
const ctrl = createTranscription({
provider,
recorder,
transport: 'http',
flushOnSegmentEnd: true, // pair with intervalMs: 0 for one request per phrase
connection: {
http: { chunking: { intervalMs: 0, overlapMs: 500, maxInFlight: 1, timeoutMs: 30_000 } },
},
});
ctrl.onTranscript((r) => console.log('final:', r.text));
await recorder.start();
await ctrl.connect();

Notes

  • Use the async model (stt-async-v3) for HTTP batch. Realtime model (stt-rt-v3) is for WebSocket.
  • Batch jobs incur additional latency (upload + processing). For live UX prefer WS.

  • model: 'stt-rt-v3' | 'stt-async-v3' — realtime vs async (batch REST).
  • sampleRate?: number — preferred sample rate; default 16000.
  • channels?: 1 | 2 — channel count; default 1.
  • audioFormat?: 'pcm_s16le' | 'auto' | string — initial config for WS; default pcm_s16le.
  • languageHints?: string[] — optional list like ['en','es'].
  • queueBudgetMs?: number — drop‑oldest send queue budget for WS; default 200ms (clamped [100..500]).

Common provider options

  • auth: { apiKey?; token?; getToken? }
  • baseUrl: string | builder per transport
  • headers, query, wsProtocols

  • 401/403 → AuthenticationError
  • 429 → RateLimitError (uses Retry‑After when present)
  • Other HTTP errors → ProviderError
  • WS close with error JSON (error_code/error_message) is mapped to the proper error type.
  • Controller: WS retry with backoff; HTTP flush timeout per request.
ctrl.onError((e) => {
if (e.name === 'RateLimitError') console.warn('rate limited');
else console.error('soniox error', e);
});

import { SONIOX_REALTIME_MODELS, SONIOX_ASYNC_MODELS } from '@saraudio/soniox';
// ['stt-rt-v3'] and ['stt-async-v3']
console.log(SONIOX_REALTIME_MODELS, SONIOX_ASYNC_MODELS);

Pick stt-rt-v3 for WS realtime or stt-async-v3 for HTTP batch.


  • Prefer WS for live captions/partials; use HTTP for async jobs or phrase‑based UX with segment‑only.
  • Keep mono/16 kHz for low latency; the hook negotiates formats with the provider.
  • For big files prefer server‑side batch pipelines (upload → job → webhook/poll → storage).

See also

  • Getting Started → Quickstart (WebSocket), Quickstart (HTTP), Quickstart (Vue + WS)
  • Concepts → Controller & Transport