{
"type": "config",
"model": "stt-rt-v4",
"audio_format": "pcm_s16le",
"sample_rate": 16000,
"num_channels": 1,
"enable_speaker_diarization": true,
"enable_endpoint_detection": true,
"max_endpoint_delay_ms": 500,
"language_hints": [
"en"
]
}"0000FF00000000FF00000000010101010101010100000000FFFFFFFFFFFFFEFEFDFEFEFEFEFDFDFEFEFEFEFEFEFEFEFEFFFFFFFFFEFEFEFEFF000100001020303030303030303030201010000FFFFFEFDFDFDFDFEFFFFFFFF0001020303020201000000FFFDFCFBFAFAFBFAF9F8F7F7F7F6F6F4F2F2F3F7FC000406090F14191A19181715110E0A05FEF9F6F3F0EEECEBEBECEEF2F6F9FC0005090D0F101010100E0C080401"{
"type": "finalize"
}{
"type": "keepalive"
}{
"type": ""
}{
"tokens": [],
"final_audio_proc_ms": 0,
"total_audio_proc_ms": 0
}{
"error_code": 401,
"error_message": "Invalid or expired API key"
}Speech AI Real-time v4
Stream real-time transcripts from Soniox Speech AI v4 over WebSocket with speaker diarization, language detection, and configurable endpoint detection.
Documentation Index
Fetch the complete documentation index at: https://docs.slng.ai/llms.txt
Use this file to discover all available pages before exploring further.
{
"type": "config",
"model": "stt-rt-v4",
"audio_format": "pcm_s16le",
"sample_rate": 16000,
"num_channels": 1,
"enable_speaker_diarization": true,
"enable_endpoint_detection": true,
"max_endpoint_delay_ms": 500,
"language_hints": [
"en"
]
}"0000FF00000000FF00000000010101010101010100000000FFFFFFFFFFFFFEFEFDFEFEFEFEFDFDFEFEFEFEFEFEFEFEFEFFFFFFFFFEFEFEFEFF000100001020303030303030303030201010000FFFFFEFDFDFDFDFEFFFFFFFF0001020303020201000000FFFDFCFBFAFAFBFAF9F8F7F7F7F6F6F4F2F2F3F7FC000406090F14191A19181715110E0A05FEF9F6F3F0EEECEBEBECEEF2F6F9FC0005090D0F101010100E0C080401"{
"type": "finalize"
}{
"type": "keepalive"
}{
"type": ""
}{
"tokens": [],
"final_audio_proc_ms": 0,
"total_audio_proc_ms": 0
}{
"error_code": 401,
"error_message": "Invalid or expired API key"
}API key issued by SLNG. Pass as Authorization: Bearer <token> in the WebSocket upgrade request headers.
GET
Target world part override. Auto-selected if not provided. Available world parts: ap, eu, na.
ap, eu, naInitialize a Soniox STT session with provider-specific recognition configuration.
Stream raw binary audio frames to be transcribed. Sent as binary WebSocket frames (NOT JSON) — no envelope, no base64 encoding. Frame format must match the audio_format, sample_rate, and num_channels declared in the init message (default pcm_s16le at 16kHz mono).
Force-finalize buffered audio tokens without closing the connection.
Keep the WebSocket connection alive during silence.
Signal end of audio stream.
Native Soniox tokens frame forwarded unchanged from the upstream provider.
Native Soniox error frame forwarded unchanged from the upstream provider.
Was this page helpful?