Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.slng.ai/llms.txt

Use this file to discover all available pages before exploring further.

WebSocket connections give you sub-100ms latency and bidirectional streaming, ideal for voice agents, live transcription, and any use case where audio flows continuously. If you don’t need real-time streaming, see HTTP vs. WebSocket instead. Prerequisites:
  • An SLNG API key
  • Basic familiarity with WebSockets (open, send, receive, close)

Protocol overview

This page documents the WebSocket protocol used by SLNG-hosted models and bridges. Third-party providers (Deepgram, KugelAudio, Sarvam) may expose their own native WebSocket formats — check the per-model API reference in the sidebar for provider-specific details.
Supported encodings, sample rates, and optional fields vary by model. For model-specific parameters, see the Text-to-Speech and Speech-to-Text tabs in the sidebar.

TTS WebSocket Protocol

For the full per-model parameter list and response schema, see the model’s page in the Text-to-Speech sidebar.

Connection

WSS wss://api.slng.ai/v1/tts/{provider}/{model}:{variant}

Message Flow

Browse all available TTS models and endpoints on the Text-to-Speech models page.

Client → Server Messages

Initialize Session

Initialize a session with model and voice configuration before sending text.
{
  "type": "init",
  "model": "aura:2",
  "voice": "aura-2-thalia-en",
  "config": {
    "sample_rate": 24000,
    "encoding": "linear16",  // 16-bit signed PCM — the most common raw audio format
    "language": "en",
    "speed": 1.0
  }
}
Parameters:
FieldTypeRequiredDescription
type"init"YesMessage type
modelstringYesModel identifier
voicestringNoVoice identifier
configobjectNoSession configuration
config.sample_ratenumberNoAudio sample rate in Hz
config.encoding"linear16" | "mp3" | "opus"NoAudio encoding format
config.languagestringNoLanguage code
config.speednumberNoSpeech speed multiplier

Send Text for Synthesis

Send text to synthesize into audio. Set flush: true to finalize the current segment immediately instead of waiting for more text.
{
  "type": "text",
  "text": "Hello from SLNG.",
  "flush": false
}
FieldTypeRequiredDescription
type"text"YesMessage type
textstringYesText to synthesize
flushbooleanNoWhether to flush remaining audio immediately

Flush Buffer

Force any buffered text/audio to be finalized and delivered.
{
  "type": "flush"
}

Clear Buffer

Clear any queued text/audio from the current session.
{
  "type": "clear"
}

Close Session

Close the current session and stop any further audio generation.
{
  "type": "close"
}

Server → Client Messages

Session Ready

Indicates the session is ready to receive messages.
{
  "type": "ready",
  "session_id": "session_123"
}

Audio Chunk

Chunk of base64-encoded audio data.
{
  "type": "audio_chunk",
  "data": "SGVsbG8gV29ybGQ=",
  "sequence": 1
}
FieldTypeRequiredDescription
type"audio_chunk"YesMessage type
datastringYesBase64-encoded audio data
sequenceintegerNoSequence number for ordering
Audio may also arrive as raw binary WebSocket frames instead of base64 JSON. Binary frames have lower overhead.

Segment Start

Signals the start of a synthesized segment.
{
  "type": "segment_start",
  "segment_id": "seg_1"
}

Segment End

Signals the end of a synthesized segment.
{
  "type": "segment_end",
  "segment_id": "seg_1"
}

Flushed

Acknowledges that buffered output was flushed.
{
  "type": "flushed"
}

Cleared

Acknowledges that queued output was cleared.
{
  "type": "cleared"
}

Audio End

Signals the end of audio generation.
{
  "type": "audio_end",
  "duration": 0.12
}
FieldTypeRequiredDescription
type"audio_end"YesMessage type
durationnumberNoAudio duration

Error

Indicates an error occurred during synthesis.
{
  "type": "error",
  "code": "provider_error",
  "message": "Upstream error"
}
FieldTypeRequiredDescription
type"error"YesMessage type
codestringYesError code
messagestringYesHuman-readable error description
Common Error Codes:
  • auth_error: Invalid or missing API key
  • config_error: Invalid configuration
  • rate_limit: Too many requests
  • provider_error: Upstream provider error

STT WebSocket Protocol

For the full per-model parameter list and response schema, see the model’s page in the Speech-to-Text sidebar.

Connection

WSS wss://api.slng.ai/v1/stt/{provider}/{model}:{variant}

Message Flow

Browse all available STT models and endpoints on the Speech-to-Text models page.

Client → Server Messages

Initialize Session

Initialize a session with recognition configuration before streaming audio.
{
  "type": "init",
  "config": {
    "language": "en",
    "sample_rate": 16000,
    "encoding": "linear16",
    "enable_vad": true,
    "enable_diarization": false,
    "enable_word_timestamps": true,
    "enable_partials": true
  }
}
Parameters:
FieldTypeRequiredDescription
type"init"YesMessage type
configobjectNoRecognition configuration
config.languagestringNoLanguage code for recognition
config.sample_ratenumberNoAudio sample rate in Hz
config.encoding"linear16" | "mp3" | "opus"NoAudio encoding format
config.enable_vadbooleanNoEnable voice activity detection
config.enable_diarizationbooleanNoEnable speaker diarization
config.enable_word_timestampsbooleanNoInclude word-level timestamps
config.enable_partialsbooleanNoEnable partial/interim transcripts

Send Audio Data

Stream an audio frame to be transcribed. After initialization, send audio in one of two formats:
Binary frames are recommended over base64-encoded JSON for lower overhead.
Binary frames — send raw PCM audio samples directly as binary WebSocket frames:
ws.send(audioBuffer); // ArrayBuffer or Uint8Array
JSON messages with base64-encoded data:
{
  "type": "audio",
  "data": "SGVsbG8gV29ybGQ="
}
FieldTypeRequiredDescription
type"audio"YesMessage type
datastringYesBase64-encoded audio data

Finalize Transcription

Force the server to finalize any buffered audio and return results. The connection stays open so you can continue streaming.
{
  "type": "finalize"
}

Close Stream

Signal that no more audio will be sent. The server processes any remaining audio, sends final results, then closes the connection.
{
  "type": "close"
}
Use finalize when you want to flush results mid-session (e.g., between utterances). Use close when you are done and want to end the session.

Keep-Alive

Send periodically during silence to prevent idle disconnection.
{
  "type": "keepalive"
}

Server → Client Messages

Session Ready

Indicates the session is ready to receive audio.
{
  "type": "ready",
  "session_id": "session_123"
}

Partial Transcript

Interim transcription result.
{
  "type": "partial_transcript",
  "transcript": "Hello",
  "confidence": 0.91
}
FieldTypeRequiredDescription
type"partial_transcript"YesMessage type
transcriptstringYesTranscribed text so far
confidencenumberNoConfidence score (0-1)

Final Transcript

Final transcription result with optional metadata.
{
  "type": "final_transcript",
  "transcript": "Hello world",
  "confidence": 0.97,
  "language": "en",
  "duration": 2.5
}
FieldTypeRequiredDescription
type"final_transcript"YesMessage type
transcriptstringYesComplete transcribed text
confidencenumberNoOverall confidence score (0-1)
languagestringNoDetected or specified language code
durationnumberNoAudio duration

Error

Indicates an error occurred during recognition.
{
  "type": "error",
  "code": "provider_error",
  "message": "Upstream error"
}
FieldTypeRequiredDescription
type"error"YesMessage type
codestringYesError code
messagestringYesHuman-readable error description

Next Steps

Integration guide

Best practices and troubleshooting

TTS examples

JavaScript and Python code for real-time TTS

STT examples

JavaScript and Python code for real-time STT

Protocol comparison

HTTP vs. WebSocket — when to use each