{
"type": "init",
"config": {
"language": "en",
"sample_rate": 16000,
"audio_format": "pcm_s16le",
"enable_partials": true,
"enable_speaker_diarization": true,
"language_hints": [
"en"
]
}
}{
"type": "audio",
"data": "UklGRiQAAABXQVZFZm10IBAAAAABAAEA..."
}{
"type": "finalize"
}{
"type": "close"
}{
"type": "keepalive"
}{
"type": "ready",
"session_id": "sess_abc123"
}{
"type": "partial_transcript",
"transcript": "Hello world",
"confidence": 0.92,
"tokens": [
{
"text": "Hello",
"start_ms": 0,
"end_ms": 500,
"confidence": 0.95,
"is_final": false,
"speaker": "0"
},
{
"text": " world",
"start_ms": 500,
"end_ms": 1000,
"confidence": 0.9,
"is_final": false,
"speaker": "0"
}
]
}{
"type": "final_transcript",
"transcript": "Hello world, this is a test.",
"confidence": 0.95,
"language": "en",
"duration": 2.5
}{
"type": "error",
"code": "auth_error",
"message": "Invalid or expired API key"
}Real-time speech-to-text transcription using Soniox Speech AI via WebSocket. Supports streaming audio with speaker diarization, automatic language identification, and configurable endpoint detection in 60+ languages.
{
"type": "init",
"config": {
"language": "en",
"sample_rate": 16000,
"audio_format": "pcm_s16le",
"enable_partials": true,
"enable_speaker_diarization": true,
"language_hints": [
"en"
]
}
}{
"type": "audio",
"data": "UklGRiQAAABXQVZFZm10IBAAAAABAAEA..."
}{
"type": "finalize"
}{
"type": "close"
}{
"type": "keepalive"
}{
"type": "ready",
"session_id": "sess_abc123"
}{
"type": "partial_transcript",
"transcript": "Hello world",
"confidence": 0.92,
"tokens": [
{
"text": "Hello",
"start_ms": 0,
"end_ms": 500,
"confidence": 0.95,
"is_final": false,
"speaker": "0"
},
{
"text": " world",
"start_ms": 500,
"end_ms": 1000,
"confidence": 0.9,
"is_final": false,
"speaker": "0"
}
]
}{
"type": "final_transcript",
"transcript": "Hello world, this is a test.",
"confidence": 0.95,
"language": "en",
"duration": 2.5
}{
"type": "error",
"code": "auth_error",
"message": "Invalid or expired API key"
}API key issued by SLNG. Pass as Authorization: Bearer <token> in the WebSocket upgrade request headers.
GET
Target world part override. Auto-selected if not provided. Available world parts: eu, na.
eu, naInitialize a Soniox STT session with provider-specific recognition configuration.
Stream an audio frame to be transcribed.
Force-finalize buffered audio tokens without closing the connection.
Signal end of audio stream and close the connection.
Keep the WebSocket connection alive during silence.
Indicates the session is ready to receive audio.
Transcription result from Soniox with token-level detail.
Final transcription result with optional metadata.
Indicates an error occurred during recognition.
Was this page helpful?