Skip to main content
WebSocket connections give you sub-100ms latency and bidirectional streaming, ideal for voice agents, live transcription, and any use case where audio flows continuously. If you don’t need real-time streaming, see HTTP vs. WebSocket instead. Prerequisites:
  • An SLNG API key
  • Basic familiarity with WebSockets (open, send, receive, close)

SLNG Protocol

All SLNG WebSocket endpoints use a consistent protocol, regardless of the underlying provider. You can switch models without changing your integration code.
Supported encodings, sample rates, and optional fields vary by model. This page covers the shared protocol. For model-specific parameters, see the Text-to-Speech and Speech-to-Text tabs in the sidebar.

TTS WebSocket Protocol

Connection

WSS wss://api.slng.ai/v1/tts/{provider}/{model}:{variant}
Available Endpoints:
  • wss://api.slng.ai/v1/tts/deepgram/aura:2
  • wss://api.slng.ai/v1/tts/slng/rime/arcana:3-en
  • wss://api.slng.ai/v1/tts/slng/canopylabs/orpheus:en

Client → Server Messages

Initialize Session

Initialize a session with model and voice configuration before sending text.
{
  "type": "init",
  "model": "aura:2",
  "voice": "aura-2-thalia-en",
  "config": {
    "sample_rate": 24000,
    "encoding": "linear16",  // 16-bit signed PCM — the most common raw audio format
    "language": "en",
    "speed": 1.0
  }
}
Parameters:
FieldTypeRequiredDescription
type"init"YesMessage type
modelstringYesModel identifier
voicestringNoVoice identifier
configobjectNoSession configuration
config.sample_ratenumberNoAudio sample rate in Hz
config.encoding"linear16" | "mp3" | "opus"NoAudio encoding format
config.languagestringNoLanguage code
config.speednumberNoSpeech speed multiplier

Send Text for Synthesis

Send text to synthesize into audio output.
{
  "type": "text",
  "text": "Hello from SLNG.",
  "flush": false
}
Parameters:
FieldTypeRequiredDescription
type"text"YesMessage type
textstringYesText to synthesize
flushbooleanNoWhether to flush remaining audio immediately

Flush Buffer

Force any buffered text/audio to be finalized and delivered.
{
  "type": "flush"
}

Clear Buffer

Clear any queued text/audio from the current session.
{
  "type": "clear"
}

Cancel Generation

Cancel the current generation and stop any further audio.
{
  "type": "cancel"
}

Server → Client Messages

Session Ready

Indicates the session is ready to receive messages.
{
  "type": "ready",
  "session_id": "session_123"
}

Audio Chunk

Chunk of base64-encoded audio data.
{
  "type": "audio_chunk",
  "data": "SGVsbG8gV29ybGQ=",
  "sequence": 1
}
FieldTypeRequiredDescription
type"audio_chunk"YesMessage type
datastringYesBase64-encoded audio data
sequenceintegerNoSequence number for ordering
Audio may also arrive as raw binary WebSocket frames instead of base64 JSON. Binary frames have lower overhead.

Segment Start

Signals the start of a synthesized segment.
{
  "type": "segment_start",
  "segment_id": "seg_1"
}

Segment End

Signals the end of a synthesized segment.
{
  "type": "segment_end",
  "segment_id": "seg_1"
}

Flushed

Acknowledges that buffered output was flushed.
{
  "type": "flushed"
}

Cleared

Acknowledges that queued output was cleared.
{
  "type": "cleared"
}

Audio End

Signals the end of audio generation.
{
  "type": "audio_end",
  "duration": 0.12
}
FieldTypeRequiredDescription
type"audio_end"YesMessage type
durationnumberNoAudio duration

Error

Indicates an error occurred during synthesis.
{
  "type": "error",
  "code": "provider_error",
  "message": "Upstream error"
}
FieldTypeRequiredDescription
type"error"YesMessage type
codestringYesError code
messagestringYesHuman-readable error description
Common Error Codes:
  • auth_error: Invalid or missing API key
  • config_error: Invalid configuration
  • rate_limit: Too many requests
  • provider_error: Upstream provider error

STT WebSocket Protocol

Connection

WSS wss://api.slng.ai/v1/stt/{provider}/{model}:{variant}
Available Endpoints:
  • wss://api.slng.ai/v1/stt/deepgram/nova:2
  • wss://api.slng.ai/v1/stt/slng/openai/whisper:large-v3

Client → Server Messages

Initialize Session

Initialize a session with recognition configuration before streaming audio.
{
  "type": "init",
  "config": {
    "language": "en",
    "sample_rate": 16000,
    "encoding": "linear16",
    "enable_vad": true,
    "enable_diarization": false,
    "enable_word_timestamps": true,
    "enable_partials": true
  }
}
Parameters:
FieldTypeRequiredDescription
type"init"YesMessage type
configobjectNoRecognition configuration
config.languagestringNoLanguage code for recognition
config.sample_ratenumberNoAudio sample rate in Hz
config.encoding"linear16" | "mp3" | "opus"NoAudio encoding format
config.enable_vadbooleanNoEnable voice activity detection
config.enable_diarizationbooleanNoEnable speaker diarization
config.enable_word_timestampsbooleanNoInclude word-level timestamps
config.enable_partialsbooleanNoEnable partial/interim transcripts

Send Audio Data

Stream an audio frame to be transcribed. After initialization, send audio in one of two formats:
Binary frames are recommended over base64-encoded JSON for lower overhead.
Binary frames: Send raw PCM audio samples directly as binary WebSocket frames:
// Send raw audio bytes directly
ws.send(audioBuffer); // ArrayBuffer or Uint8Array
JSON messages with base64-encoded data:
{
  "type": "audio",
  "data": "SGVsbG8gV29ybGQ="
}
FieldTypeRequiredDescription
type"audio"YesMessage type
datastringYesBase64-encoded audio data

Finalize Transcription

Signal that no more audio frames will be sent.
{
  "type": "finalize"
}

Server → Client Messages

Session Ready

Indicates the session is ready to receive audio.
{
  "type": "ready",
  "session_id": "session_123"
}

Partial Transcript

Interim transcription result.
{
  "type": "partial_transcript",
  "transcript": "Hello",
  "confidence": 0.91
}
FieldTypeRequiredDescription
type"partial_transcript"YesMessage type
transcriptstringYesTranscribed text so far
confidencenumberNoConfidence score (0-1)

Final Transcript

Final transcription result with optional metadata.
{
  "type": "final_transcript",
  "transcript": "Hello world",
  "confidence": 0.97,
  "language": "en",
  "duration": 2.5
}
FieldTypeRequiredDescription
type"final_transcript"YesMessage type
transcriptstringYesComplete transcribed text
confidencenumberNoOverall confidence score (0-1)
languagestringNoDetected or specified language code
durationnumberNoAudio duration

Error

Indicates an error occurred during recognition.
{
  "type": "error",
  "code": "provider_error",
  "message": "Upstream error"
}
FieldTypeRequiredDescription
type"error"YesMessage type
codestringYesError code
messagestringYesHuman-readable error description

Complete Examples

For full working implementations with audio playback, error handling, and reconnection:

Interruption & Cancellation (TTS)

For voice agents, you need to handle interruptions. When a user starts speaking, stop the current TTS output immediately.
The key messages for interruption control:
  • Send { "type": "cancel" } to stop server-side generation
  • Send { "type": "clear" } to discard queued audio
  • Clear your local audio buffer and stop playback
For complete patterns (immediate interrupt, clear-and-restart, fade-out, voice agent loop), see TTS WebSocket examples.

Best Practices

Items 1 and 2 are required for any production integration. Items 3-5 improve quality and resilience.

1. Connection Management

Implement Reconnection Logic:
class ResilientWebSocket {
  constructor(url) {
    this.url = url;
    this.backoff = 1000;
    this.maxBackoff = 30000;
    this.connect();
  }

  connect() {
    this.ws = new WebSocket(this.url);

    this.ws.onclose = () => {
      console.log(`Reconnecting in ${this.backoff}ms`);
      setTimeout(() => this.connect(), this.backoff);
      this.backoff = Math.min(this.backoff * 2, this.maxBackoff);
    };

    this.ws.onopen = () => {
      console.log("Connected");
      this.backoff = 1000; // Reset backoff on successful connection
    };
  }
}

2. Error Handling

Always handle errors gracefully:
ws.onerror = (error) => {
  console.error("WebSocket error:", error);
  // Notify user, attempt recovery
};

ws.onmessage = (event) => {
  if (typeof event.data === "string") {
    const message = JSON.parse(event.data);
    if (message.type === "error") {
      handleServerError(message.code, message.message);
    }
  }
};

3. Buffer Management (TTS)

Use flush strategically to control latency vs. quality:
// For low latency: flush after each sentence
ws.send(JSON.stringify({ type: "text", text: sentence }));
ws.send(JSON.stringify({ type: "flush" }));

// For better quality: batch multiple sentences
ws.send(JSON.stringify({ type: "text", text: paragraph }));
// ... send more text ...
ws.send(JSON.stringify({ type: "flush" })); // Flush at the end

4. Audio Format Consistency

Ensure your audio format matches configuration:
// Configuration
{
  encoding: 'linear16',    // 16-bit PCM
  sample_rate: 16000       // 16kHz
}

// Audio capture must match:
// - 16-bit samples
// - 16000 Hz sample rate
// - Single channel (mono)

5. Heartbeat/Keep-Alive

Implement ping/pong to keep connection alive:
let pingInterval = setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: "ping" }));
  }
}, 30000); // Every 30 seconds

ws.onclose = () => {
  clearInterval(pingInterval);
};

Troubleshooting

Problem: WebSocket disconnects unexpectedly
  • Implement reconnection logic with exponential backoff (see Connection Management above)
  • Send periodic ping messages every 30 seconds to prevent idle timeouts
  • If behind a corporate proxy, confirm it supports WebSocket upgrades (Connection: Upgrade header)
  • Run a WebSocket echo test (wscat -c wss://echo.websocket.org) to rule out local network issues
Problem: Choppy or distorted audio playback
  • Buffer at least 200ms of audio before starting playback to absorb network jitter
  • Use the WebAudio API (AudioContext) instead of <audio> elements for gapless chunk playback
  • Confirm the sample rate in your audio player matches the sample_rate from your init config
  • 24kHz linear16 audio requires ~384 kbps; verify your connection can sustain this
Problem: Transcription results lag behind audio
  • Send audio in 20-100ms chunks (320-1600 bytes at 16kHz linear16) rather than large buffers
  • Measure round-trip time with Date.now() around send/receive to isolate network vs. server latency
  • Confirm encoding and sample_rate in your init config match your actual audio format
  • For real-time use, prefer Deepgram Nova which is optimized for streaming latency
Problem: Connection rejected with 401
  • Pass the API key as a header during the WebSocket handshake: Authorization: Bearer YOUR_KEY
  • Verify your key is active in the SLNG dashboard
  • Check for trailing whitespace or newlines in the key string
  • Some WebSocket libraries don’t support custom headers; pass the key as a query parameter (?token=YOUR_KEY) if needed

Performance Tips

Keep WebSocket connections open across multiple operations rather than reconnecting each time. For TTS, send longer text chunks for better synthesis quality. For STT, stream audio in 20-100ms chunks (320-1600 bytes at 16kHz linear16). Handle audio encoding and decoding client-side, and track round-trip latency to catch regressions early.

Next Steps