Documentation Index Fetch the complete documentation index at: https://docs.slng.ai/llms.txt
Use this file to discover all available pages before exploring further.
WebSocket connections give you sub-100ms latency and bidirectional streaming, ideal for voice agents, live transcription, and any use case where audio flows continuously. If you don’t need real-time streaming, see HTTP vs. WebSocket instead.
Prerequisites:
An SLNG API key
Basic familiarity with WebSockets (open, send, receive, close)
Protocol overview
This page documents the WebSocket protocol used by SLNG-hosted models and bridges . Third-party providers (Deepgram, KugelAudio, Sarvam) may expose their own native WebSocket formats — check the per-model API reference in the sidebar for provider-specific details.
Supported encodings, sample rates, and optional fields vary by model. For model-specific parameters, see the Text-to-Speech and Speech-to-Text tabs in the sidebar.
TTS WebSocket Protocol
For the full per-model parameter list and response schema, see the model’s page in the Text-to-Speech sidebar.
Connection
WSS wss://api.slng.ai/v1/tts/{provider}/{model}:{variant}
Message Flow
Browse all available TTS models and endpoints on the Text-to-Speech models page.
Client → Server Messages
Initialize Session
Initialize a session with model and voice configuration before sending text.
{
"type" : "init" ,
"model" : "aura:2" ,
"voice" : "aura-2-thalia-en" ,
"config" : {
"sample_rate" : 24000 ,
"encoding" : "linear16" , // 16-bit signed PCM — the most common raw audio format
"language" : "en" ,
"speed" : 1.0
}
}
Parameters:
Field Type Required Description type"init"Yes Message type modelstringYes Model identifier voicestringNo Voice identifier configobjectNo Session configuration config.sample_ratenumberNo Audio sample rate in Hz config.encoding"linear16" | "mp3" | "opus"No Audio encoding format config.languagestringNo Language code config.speednumberNo Speech speed multiplier
Send Text for Synthesis
Send text to synthesize into audio. Set flush: true to finalize the current segment immediately instead of waiting for more text.
{
"type" : "text" ,
"text" : "Hello from SLNG." ,
"flush" : false
}
Field Type Required Description type"text"Yes Message type textstringYes Text to synthesize flushbooleanNo Whether to flush remaining audio immediately
Flush Buffer
Force any buffered text/audio to be finalized and delivered.
Clear Buffer
Clear any queued text/audio from the current session.
Close Session
Close the current session and stop any further audio generation.
Server → Client Messages
Session Ready
Indicates the session is ready to receive messages.
{
"type" : "ready" ,
"session_id" : "session_123"
}
Audio Chunk
Chunk of base64-encoded audio data.
{
"type" : "audio_chunk" ,
"data" : "SGVsbG8gV29ybGQ=" ,
"sequence" : 1
}
Field Type Required Description type"audio_chunk"Yes Message type datastringYes Base64-encoded audio data sequenceintegerNo Sequence number for ordering
Audio may also arrive as raw binary WebSocket frames instead of base64 JSON. Binary frames have lower overhead.
Segment Start
Signals the start of a synthesized segment.
{
"type" : "segment_start" ,
"segment_id" : "seg_1"
}
Segment End
Signals the end of a synthesized segment.
{
"type" : "segment_end" ,
"segment_id" : "seg_1"
}
Flushed
Acknowledges that buffered output was flushed.
Cleared
Acknowledges that queued output was cleared.
Audio End
Signals the end of audio generation.
{
"type" : "audio_end" ,
"duration" : 0.12
}
Field Type Required Description type"audio_end"Yes Message type durationnumberNo Audio duration
Error
Indicates an error occurred during synthesis.
{
"type" : "error" ,
"code" : "provider_error" ,
"message" : "Upstream error"
}
Field Type Required Description type"error"Yes Message type codestringYes Error code messagestringYes Human-readable error description
Common Error Codes:
auth_error: Invalid or missing API key
config_error: Invalid configuration
rate_limit: Too many requests
provider_error: Upstream provider error
STT WebSocket Protocol
For the full per-model parameter list and response schema, see the model’s page in the Speech-to-Text sidebar.
Connection
WSS wss://api.slng.ai/v1/stt/{provider}/{model}:{variant}
Message Flow
Browse all available STT models and endpoints on the Speech-to-Text models page.
Client → Server Messages
Initialize Session
Initialize a session with recognition configuration before streaming audio.
{
"type" : "init" ,
"config" : {
"language" : "en" ,
"sample_rate" : 16000 ,
"encoding" : "linear16" ,
"enable_vad" : true ,
"enable_diarization" : false ,
"enable_word_timestamps" : true ,
"enable_partials" : true
}
}
Parameters:
Field Type Required Description type"init"Yes Message type configobjectNo Recognition configuration config.languagestringNo Language code for recognition config.sample_ratenumberNo Audio sample rate in Hz config.encoding"linear16" | "mp3" | "opus"No Audio encoding format config.enable_vadbooleanNo Enable voice activity detection config.enable_diarizationbooleanNo Enable speaker diarization config.enable_word_timestampsbooleanNo Include word-level timestamps config.enable_partialsbooleanNo Enable partial/interim transcripts
Send Audio Data
Stream an audio frame to be transcribed. After initialization, send audio in one of two formats:
Binary frames are recommended over base64-encoded JSON for lower overhead.
Binary frames — send raw PCM audio samples directly as binary WebSocket frames:
ws . send ( audioBuffer ); // ArrayBuffer or Uint8Array
JSON messages with base64-encoded data:
{
"type" : "audio" ,
"data" : "SGVsbG8gV29ybGQ="
}
Field Type Required Description type"audio"Yes Message type datastringYes Base64-encoded audio data
Finalize Transcription
Force the server to finalize any buffered audio and return results. The connection stays open so you can continue streaming.
Close Stream
Signal that no more audio will be sent. The server processes any remaining audio, sends final results, then closes the connection.
Use finalize when you want to flush results mid-session (e.g., between utterances). Use close when you are done and want to end the session.
Keep-Alive
Send periodically during silence to prevent idle disconnection.
Server → Client Messages
Session Ready
Indicates the session is ready to receive audio.
{
"type" : "ready" ,
"session_id" : "session_123"
}
Partial Transcript
Interim transcription result.
{
"type" : "partial_transcript" ,
"transcript" : "Hello" ,
"confidence" : 0.91
}
Field Type Required Description type"partial_transcript"Yes Message type transcriptstringYes Transcribed text so far confidencenumberNo Confidence score (0-1)
Final Transcript
Final transcription result with optional metadata.
{
"type" : "final_transcript" ,
"transcript" : "Hello world" ,
"confidence" : 0.97 ,
"language" : "en" ,
"duration" : 2.5
}
Field Type Required Description type"final_transcript"Yes Message type transcriptstringYes Complete transcribed text confidencenumberNo Overall confidence score (0-1) languagestringNo Detected or specified language code durationnumberNo Audio duration
Error
Indicates an error occurred during recognition.
{
"type" : "error" ,
"code" : "provider_error" ,
"message" : "Upstream error"
}
Field Type Required Description type"error"Yes Message type codestringYes Error code messagestringYes Human-readable error description
Next Steps
Integration guide Best practices and troubleshooting
TTS examples JavaScript and Python code for real-time TTS
STT examples JavaScript and Python code for real-time STT
Protocol comparison HTTP vs. WebSocket — when to use each