{
"type": "init",
"config": {
"language": "en",
"sample_rate": 16000,
"encoding": "linear16"
}
}{
"type": "audio",
"data": "UklGRiQAAABXQVZFZm10IBAAAAABAAEA..."
}{
"type": "finalize"
}{
"type": "close"
}{
"type": "ready",
"session_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"context_id": "stt-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"model": "whisper-large-v3",
"commands": [
"flush",
"stop",
"clear",
"context"
],
"features": {
"vad": true,
"partial_transcripts": true,
"final_transcripts": true,
"language_detection": true,
"speaker_diarization": true
}
}{
"type": "partial_transcript",
"context_id": "stt-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"session_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"transcript": "The quick brown fox jumps over the",
"language": "en",
"confidence": 1,
"is_final": false,
"audio_duration": 2.56
}{
"type": "final_transcript",
"context_id": "stt-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"session_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"transcript": "The quick brown fox jumps over the lazy dog.",
"language": "en",
"confidence": 1,
"is_final": true,
"audio_duration": 3.84,
"segments": [
{
"text": " The quick brown fox jumps over the lazy dog.",
"start": 0.066,
"end": 3.71,
"avg_logprob": -0.08
}
]
}{
"type": "error",
"code": "auth_error",
"message": "Invalid or expired API key"
}Real-time speech-to-text transcription using OpenAI’s Whisper Large v3 model via WebSocket. Supports streaming audio input with intelligent Voice Activity Detection (VAD), partial transcripts for immediate feedback, and automatic language detection. Perfect for live transcription, voice commands, and real-time captioning.
{
"type": "init",
"config": {
"language": "en",
"sample_rate": 16000,
"encoding": "linear16"
}
}{
"type": "audio",
"data": "UklGRiQAAABXQVZFZm10IBAAAAABAAEA..."
}{
"type": "finalize"
}{
"type": "close"
}{
"type": "ready",
"session_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"context_id": "stt-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"model": "whisper-large-v3",
"commands": [
"flush",
"stop",
"clear",
"context"
],
"features": {
"vad": true,
"partial_transcripts": true,
"final_transcripts": true,
"language_detection": true,
"speaker_diarization": true
}
}{
"type": "partial_transcript",
"context_id": "stt-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"session_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"transcript": "The quick brown fox jumps over the",
"language": "en",
"confidence": 1,
"is_final": false,
"audio_duration": 2.56
}{
"type": "final_transcript",
"context_id": "stt-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"session_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"transcript": "The quick brown fox jumps over the lazy dog.",
"language": "en",
"confidence": 1,
"is_final": true,
"audio_duration": 3.84,
"segments": [
{
"text": " The quick brown fox jumps over the lazy dog.",
"start": 0.066,
"end": 3.71,
"avg_logprob": -0.08
}
]
}{
"type": "error",
"code": "auth_error",
"message": "Invalid or expired API key"
}API key issued by SLNG. Pass as Authorization: Bearer <token> in the WebSocket upgrade request headers.
GET
Target world part override. Auto-selected if not provided. Available world parts: eu.
euInitialize an SLNG-hosted Whisper Large v3 STT session.
Stream an audio frame to be transcribed.
Force-finalize buffered audio tokens without closing the connection.
Signal end of audio stream and close the connection.
Indicates the Whisper session is ready to receive audio.
Interim Whisper transcription result, updated as more audio is processed.
Final Whisper transcription result with segment-level detail.
Indicates an error occurred during recognition.
Was this page helpful?