- An SLNG API key
- Basic familiarity with WebSockets (open, send, receive, close)
SLNG Protocol
All SLNG WebSocket endpoints use a consistent protocol, regardless of the underlying provider. You can switch models without changing your integration code.Supported encodings, sample rates, and optional fields vary by model. This page covers the shared protocol. For model-specific parameters, see the Text-to-Speech and Speech-to-Text tabs in the sidebar.
TTS WebSocket Protocol
Connection
wss://api.slng.ai/v1/tts/deepgram/aura:2wss://api.slng.ai/v1/tts/slng/rime/arcana:3-enwss://api.slng.ai/v1/tts/slng/canopylabs/orpheus:en
Client → Server Messages
Initialize Session
Initialize a session with model and voice configuration before sending text.| Field | Type | Required | Description |
|---|---|---|---|
type | "init" | Yes | Message type |
model | string | Yes | Model identifier |
voice | string | No | Voice identifier |
config | object | No | Session configuration |
config.sample_rate | number | No | Audio sample rate in Hz |
config.encoding | "linear16" | "mp3" | "opus" | No | Audio encoding format |
config.language | string | No | Language code |
config.speed | number | No | Speech speed multiplier |
Send Text for Synthesis
Send text to synthesize into audio output.| Field | Type | Required | Description |
|---|---|---|---|
type | "text" | Yes | Message type |
text | string | Yes | Text to synthesize |
flush | boolean | No | Whether to flush remaining audio immediately |
Flush Buffer
Force any buffered text/audio to be finalized and delivered.Clear Buffer
Clear any queued text/audio from the current session.Cancel Generation
Cancel the current generation and stop any further audio.Server → Client Messages
Session Ready
Indicates the session is ready to receive messages.Audio Chunk
Chunk of base64-encoded audio data.| Field | Type | Required | Description |
|---|---|---|---|
type | "audio_chunk" | Yes | Message type |
data | string | Yes | Base64-encoded audio data |
sequence | integer | No | Sequence number for ordering |
Segment Start
Signals the start of a synthesized segment.Segment End
Signals the end of a synthesized segment.Flushed
Acknowledges that buffered output was flushed.Cleared
Acknowledges that queued output was cleared.Audio End
Signals the end of audio generation.| Field | Type | Required | Description |
|---|---|---|---|
type | "audio_end" | Yes | Message type |
duration | number | No | Audio duration |
Error
Indicates an error occurred during synthesis.| Field | Type | Required | Description |
|---|---|---|---|
type | "error" | Yes | Message type |
code | string | Yes | Error code |
message | string | Yes | Human-readable error description |
auth_error: Invalid or missing API keyconfig_error: Invalid configurationrate_limit: Too many requestsprovider_error: Upstream provider error
STT WebSocket Protocol
Connection
wss://api.slng.ai/v1/stt/deepgram/nova:2wss://api.slng.ai/v1/stt/slng/openai/whisper:large-v3
Client → Server Messages
Initialize Session
Initialize a session with recognition configuration before streaming audio.| Field | Type | Required | Description |
|---|---|---|---|
type | "init" | Yes | Message type |
config | object | No | Recognition configuration |
config.language | string | No | Language code for recognition |
config.sample_rate | number | No | Audio sample rate in Hz |
config.encoding | "linear16" | "mp3" | "opus" | No | Audio encoding format |
config.enable_vad | boolean | No | Enable voice activity detection |
config.enable_diarization | boolean | No | Enable speaker diarization |
config.enable_word_timestamps | boolean | No | Include word-level timestamps |
config.enable_partials | boolean | No | Enable partial/interim transcripts |
Send Audio Data
Stream an audio frame to be transcribed. After initialization, send audio in one of two formats: Binary frames: Send raw PCM audio samples directly as binary WebSocket frames:| Field | Type | Required | Description |
|---|---|---|---|
type | "audio" | Yes | Message type |
data | string | Yes | Base64-encoded audio data |
Finalize Transcription
Signal that no more audio frames will be sent.Server → Client Messages
Session Ready
Indicates the session is ready to receive audio.Partial Transcript
Interim transcription result.| Field | Type | Required | Description |
|---|---|---|---|
type | "partial_transcript" | Yes | Message type |
transcript | string | Yes | Transcribed text so far |
confidence | number | No | Confidence score (0-1) |
Final Transcript
Final transcription result with optional metadata.| Field | Type | Required | Description |
|---|---|---|---|
type | "final_transcript" | Yes | Message type |
transcript | string | Yes | Complete transcribed text |
confidence | number | No | Overall confidence score (0-1) |
language | string | No | Detected or specified language code |
duration | number | No | Audio duration |
Error
Indicates an error occurred during recognition.| Field | Type | Required | Description |
|---|---|---|---|
type | "error" | Yes | Message type |
code | string | Yes | Error code |
message | string | Yes | Human-readable error description |
Complete Examples
For full working implementations with audio playback, error handling, and reconnection:- TTS WebSocket examples — JavaScript/TypeScript and Python
- STT WebSocket examples — JavaScript/TypeScript and Python
Interruption & Cancellation (TTS)
For voice agents, you need to handle interruptions. When a user starts speaking, stop the current TTS output immediately.
- Send
{ "type": "cancel" }to stop server-side generation - Send
{ "type": "clear" }to discard queued audio - Clear your local audio buffer and stop playback
Best Practices
Items 1 and 2 are required for any production integration. Items 3-5 improve quality and resilience.1. Connection Management
Implement Reconnection Logic:2. Error Handling
Always handle errors gracefully:3. Buffer Management (TTS)
Use flush strategically to control latency vs. quality:4. Audio Format Consistency
Ensure your audio format matches configuration:5. Heartbeat/Keep-Alive
Implement ping/pong to keep connection alive:Troubleshooting
Connection Drops
Connection Drops
Problem: WebSocket disconnects unexpectedly
- Implement reconnection logic with exponential backoff (see Connection Management above)
- Send periodic ping messages every 30 seconds to prevent idle timeouts
- If behind a corporate proxy, confirm it supports WebSocket upgrades (
Connection: Upgradeheader) - Run a WebSocket echo test (
wscat -c wss://echo.websocket.org) to rule out local network issues
Audio Glitches (TTS)
Audio Glitches (TTS)
Problem: Choppy or distorted audio playback
- Buffer at least 200ms of audio before starting playback to absorb network jitter
- Use the WebAudio API (
AudioContext) instead of<audio>elements for gapless chunk playback - Confirm the sample rate in your audio player matches the
sample_ratefrom yourinitconfig - 24kHz linear16 audio requires ~384 kbps; verify your connection can sustain this
Delayed Transcriptions (STT)
Delayed Transcriptions (STT)
Problem: Transcription results lag behind audio
- Send audio in 20-100ms chunks (320-1600 bytes at 16kHz linear16) rather than large buffers
- Measure round-trip time with
Date.now()around send/receive to isolate network vs. server latency - Confirm
encodingandsample_ratein yourinitconfig match your actual audio format - For real-time use, prefer Deepgram Nova which is optimized for streaming latency
Authentication Failures
Authentication Failures
Problem: Connection rejected with 401
- Pass the API key as a header during the WebSocket handshake:
Authorization: Bearer YOUR_KEY - Verify your key is active in the SLNG dashboard
- Check for trailing whitespace or newlines in the key string
- Some WebSocket libraries don’t support custom headers; pass the key as a query parameter (
?token=YOUR_KEY) if needed