Speech AI Real-time v4 - SLNG Documentation

Messages

{
  "type": "config",
  "model": "stt-rt-v4",
  "audio_format": "pcm_s16le",
  "sample_rate": 16000,
  "num_channels": 1,
  "enable_speaker_diarization": true,
  "enable_endpoint_detection": true,
  "max_endpoint_delay_ms": 500,
  "language_hints": [
    "en"
  ]
}

"0000FF00000000FF00000000010101010101010100000000FFFFFFFFFFFFFEFEFDFEFEFEFEFDFDFEFEFEFEFEFEFEFEFEFFFFFFFFFEFEFEFEFF000100001020303030303030303030201010000FFFFFEFDFDFDFDFEFFFFFFFF0001020303020201000000FFFDFCFBFAFAFBFAF9F8F7F7F7F6F6F4F2F2F3F7FC000406090F14191A19181715110E0A05FEF9F6F3F0EEECEBEBECEEF2F6F9FC0005090D0F101010100E0C080401"

WSS

stt

soniox

speech-ai:rt-v4

Messages

{
  "type": "config",
  "model": "stt-rt-v4",
  "audio_format": "pcm_s16le",
  "sample_rate": 16000,
  "num_channels": 1,
  "enable_speaker_diarization": true,
  "enable_endpoint_detection": true,
  "max_endpoint_delay_ms": 500,
  "language_hints": [
    "en"
  ]
}

"0000FF00000000FF00000000010101010101010100000000FFFFFFFFFFFFFEFEFDFEFEFEFEFDFDFEFEFEFEFEFEFEFEFEFFFFFFFFFEFEFEFEFF000100001020303030303030303030201010000FFFFFEFDFDFDFDFEFFFFFFFF0001020303020201000000FFFDFCFBFAFAFBFAF9F8F7F7F7F6F6F4F2F2F3F7FC000406090F14191A19181715110E0A05FEF9F6F3F0EEECEBEBECEEF2F6F9FC0005090D0F101010100E0C080401"

bearer

type:http

API key issued by SLNG. Pass as Authorization: Bearer <token> in the WebSocket upgrade request headers.

method

type:string

GET

headers

type:object

X-World-Part-Override

type:enum

Target world part override. Auto-selected if not provided. Available world parts: ap, eu, na.

Available options: ap, eu, na

Init Request (Soniox)

type:object

Initialize a Soniox STT session with provider-specific recognition configuration.

Binary Audio Frame

type:string

Stream raw binary audio frames to be transcribed. Sent as binary WebSocket frames (NOT JSON) — no envelope, no base64 encoding. Frame format must match the audio_format, sample_rate, and num_channels declared in the init message (default pcm_s16le at 16kHz mono).

Finalize Message

type:object

Force-finalize buffered audio tokens without closing the connection.

Keepalive Message

type:object

Keep the WebSocket connection alive during silence.

Close Message

type:object

Signal end of audio stream.

Tokens Frame (Soniox native)

type:object

Native Soniox tokens frame forwarded unchanged from the upstream provider.

Error Frame (Soniox native)

type:object

Native Soniox error frame forwarded unchanged from the upstream provider.

Saaras v3 Speech-to-text HTTP examples

⌘I