WebSockets

WebSocket APIs

Real-time, bidirectional streaming for Speech-to-Text and Text-to-Speech with persistent connections and low latency.

Quick Start

1. Get Access

WebSocket endpoints require special access:

  • Existing customers: Contact your account manager
  • New users: Email [email protected] with your use case

2. Connect & Authenticate

JavascriptCode
const ws = new WebSocket('wss://api.slng.ai/v1/tts/orpheus-websocket-stream', { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } });

3. Choose Your Protocol

Option A: Provider-Native Format (Your existing code works!)

JavascriptCode
// ElevenLabs native format ws.send(JSON.stringify({ text: "Hello world", voice_settings: {stability: 0.5} })); // Orpheus native format ws.send("Hello world"); ws.send(JSON.stringify({type: "control", action: "flush"}));

Option B: Unified Protocol (Write once, run anywhere)

JavascriptCode
// Same code for ALL providers ws.send(JSON.stringify({ type: 'init', voice_id: 'tara' })); ws.send(JSON.stringify({ type: 'input_text', text: 'Hello world' }));

Both work! Mix and match as needed.


Available Endpoints

Text-to-Speech (TTS)

EndpointDescriptionVoices/Languages
wss://api.slng.ai/v1/tts/orpheus-websocket-streamHigh-quality conversational TTStara, luna, stella, athena, hera
wss://api.slng.ai/v1/tts/orpheus-indic-websocket-streamIndic languages TTS (8 languages)kanak (default), voices for Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam
wss://api.slng.ai/v1/tts/kokoro-websocket-streamJapanese/English with emotion*af (default), af_bella, af_sky, am_adam
wss://api.slng.ai/v1/tts/cosyvoice-websocket-streamVoice cloning capabilitiesemma (default) + custom cloning
wss://api.slng.ai/v1/tts/chatterbox-websocket-streamFast conversational TTSnova (default), echo, sky, zara, max
wss://api.slng.ai/v1/tts/xtts-websocket-streamAdvanced voice cloningClaribel Dervla (default) + custom cloning
wss://api.slng.ai/v1/tts/elevenlabs-websocket-streamPremium voices with SSML100+ premium voices
wss://api.slng.ai/v1/tts/deepgram-websocket-stream40+ voices, low latencyaura-* model family
wss://api.slng.ai/v1/tts/cartesia-websocket-streamUltra-low latencyProfessional voices

*Note: Kokoro generates complete audio before streaming, resulting in slightly higher initial latency but consistent quality. Audio is chunked and streamed once generation completes.

Speech-to-Text (STT)

EndpointDescriptionLanguages
wss://api.slng.ai/v1/stt/kyutai-websocket-streamOpen-source real-time transcription12+ languages
wss://api.slng.ai/v1/stt/whisper-websocket-streamOpenAI Whisper with VAD99 languages

Commands Reference

Universal Commands (Work Everywhere)

CommandDescriptionExample
initInitialize session with optional context{"type": "init", "voice_id": "tara", "context_id": "session-123"}
input_textSend text{"type": "input_text", "text": "Hello"}
flushForce synthesis{"type": "flush"}
stopInterrupt synthesis**{"type": "stop"}
clearClear buffer{"type": "clear"}
configureUpdate settings (ElevenLabs/Cartesia){"type": "configure", "voice_id": "new"}
pingKeep alive{"type": "ping"}

**Note: stop command implementation:

  • SLNG Models: Implemented via connection close workaround - closes backend connection and sends audio_end event
  • ElevenLabs/Cartesia/Deepgram: Not supported - command is gracefully ignored
  • How it works: The gateway closes the backend WebSocket connection to stop generation immediately

Provider-Specific Parameters (Still Supported!)

ElevenLabs

JavascriptCode
{ "type": "init", "voice_id": "21m00Tcm4TlvDq8ikWAM", "voice_settings": { // ElevenLabs-specific "stability": 0.5, "similarity_boost": 0.75, "use_speaker_boost": true }, "generation_config": { // ElevenLabs-specific "chunk_length_schedule": [120, 160, 250] } }

Cartesia

JavascriptCode
{ "type": "init", "voice_id": "694f9389", "context_id": "session-123", // Cartesia-specific "language": "en" // Cartesia-specific }

SLNG Models (Orpheus, Chatterbox, CosyVoice, XTTS, Kokoro)

JavascriptCode
{ "type": "init", "voice_id": "tara", // Or: nova, emma, af, Claribel Dervla "buffer_size": 10, // SLNG-specific "max_tokens": 6144, // SLNG-specific "repetition_penalty": 1.3 // SLNG-specific }

Orpheus Indic (8 Indian Languages)

JavascriptCode
{ "type": "init", "voice_id": "kanak", "language": "hi", // hi, bn, ta, te, mr, gu, kn, ml // Supported languages: // hi = Hindi, bn = Bengali, ta = Tamil, te = Telugu // mr = Marathi, gu = Gujarati, kn = Kannada, ml = Malayalam }

Session Management with context_id

The context_id parameter enables session tracking and correlation across WebSocket connections:

How it works:

  • Optional: If not provided, a UUID is auto-generated
  • Persistent: The same context_id is included in all server events
  • Provider Support:
    • Cartesia: Native support - maintains conversation context
    • All Others: Gateway-level tracking for correlation

Usage:

JavascriptCode
// Initialize with custom context_id { "type": "init", "context_id": "conversation-123", // Your custom ID "voice_id": "tara" } // All events will include the context_id { "type": "audio_chunk", "data": "...", "context_id": "conversation-123" // Same ID returned }

Benefits:

  • Track sessions across reconnections
  • Correlate events in logging
  • Maintain conversation context (Cartesia)
  • Debug multi-session scenarios

Events from Server

TTS Events

EventDescriptionFields
readySession readysession, capabilities, context_id
audio_chunkAudio datadata (base64), sequence, duration_ms, context_id
audio_endSynthesis doneid, total_duration_ms, context_id
errorError occurredcode, message, retryable, context_id
session_endedConnection closedreason, usage, context_id

STT Events

EventDescriptionFields
readySession readymodel_id, sample_rate, encoding
transcriptPartial transcriptiontext, is_final, confidence, timestamp
final_transcriptComplete transcriptiontext, audio_duration_seconds, timestamp
errorError occurredcode, message

Complete Example

JavascriptCode
const WebSocket = require('ws'); class TTSClient { constructor(apiKey, provider = 'orpheus') { this.apiKey = apiKey; this.provider = provider; this.ws = null; } async connect(voiceId = 'tara', contextId = null) { const url = `wss://api.slng.ai/v1/tts/${this.provider}-websocket-stream`; this.ws = new WebSocket(url, { headers: { 'Authorization': `Bearer ${this.apiKey}` } }); return new Promise((resolve, reject) => { this.ws.on('open', () => { // Use unified protocol - works with ALL providers this.ws.send(JSON.stringify({ type: 'init', voice_id: voiceId, context_id: contextId || `session-${Date.now()}` })); }); this.ws.on('message', (data) => { const msg = JSON.parse(data); if (msg.type === 'ready') { console.log('Connected to', this.provider); resolve(); } if (msg.type === 'audio_chunk') { this.handleAudio(msg.data); } if (msg.type === 'error') { reject(new Error(msg.message)); } }); }); } synthesize(text) { // Unified protocol - same for all providers this.ws.send(JSON.stringify({ type: 'input_text', text: text })); this.ws.send(JSON.stringify({ type: 'flush' })); } handleAudio(base64Data) { const buffer = Buffer.from(base64Data, 'base64'); // Process PCM audio (24kHz, 16-bit, mono) console.log(`Received ${buffer.length} bytes of audio`); } close() { this.ws.close(); } } // Usage - works with ANY provider! async function main() { const client = new TTSClient('YOUR_API_KEY', 'orpheus'); await client.connect('tara'); client.synthesize('Hello from SLNG WebSocket API!'); // Switch providers with zero code changes const elevenlabs = new TTSClient('YOUR_API_KEY', 'elevenlabs'); await elevenlabs.connect('21m00Tcm4TlvDq8ikWAM'); elevenlabs.synthesize('Same code, different provider!'); } main().catch(console.error);

STT WebSocket Example

JavascriptCode
const WebSocket = require('ws'); const fs = require('fs'); class STTClient { constructor(apiKey, model = 'kyutai') { this.apiKey = apiKey; this.model = model; this.ws = null; } async connect(language = 'en') { const url = `wss://api.slng.ai/v1/stt/${this.model}-websocket-stream?language=${language}`; this.ws = new WebSocket(url, { headers: { 'Authorization': `Bearer ${this.apiKey}` } }); return new Promise((resolve, reject) => { this.ws.on('open', () => { // Initialize STT session this.ws.send(JSON.stringify({ type: 'start', sample_rate: 16000, encoding: 'pcm16', language: language })); }); this.ws.on('message', (data) => { const msg = JSON.parse(data); if (msg.type === 'ready') { console.log('STT ready:', msg.model_id); resolve(); } if (msg.type === 'transcript') { console.log(`${msg.is_final ? 'Final' : 'Partial'}: ${msg.text}`); } if (msg.type === 'final_transcript') { console.log(`Complete transcription (${msg.audio_duration_seconds}s): ${msg.text}`); } }); }); } streamAudio(audioBuffer) { // Send raw PCM audio (16kHz, 16-bit, mono) this.ws.send(audioBuffer); } finalize() { // Get final transcription this.ws.send(JSON.stringify({ type: 'finalize' })); } } // Usage async function transcribe() { const stt = new STTClient('YOUR_API_KEY', 'kyutai'); await stt.connect('en'); // Stream audio from file or microphone const audioData = fs.readFileSync('audio.pcm'); stt.streamAudio(audioData); // Get final result stt.finalize(); } transcribe().catch(console.error);

Limits & Safety

Gateway-Level Limits (Applied to All Providers)

Account Tier Limits

LimitFreePro ($49/mo)Enterprise
Concurrent connections1550
Max connection duration5 minutes30 minutes120 minutes
Max message size16KB32KB128KB
Messages per minute2060120
Max text per message500 chars2,000 chars10,000 chars
Max audio chunk16KB32KB64KB
Idle timeout5 minutes5 minutes5 minutes
Heartbeat interval30 seconds30 seconds30 seconds

Provider-Specific Limits

These are enforced by the upstream providers in addition to our gateway limits:

ProviderAdditional Limits
SLNG-Hosted ModelsNo additional limits
ElevenLabsCharacter limits based on API key tier
Deepgram2400 chars/min, 5 concurrent connections
Cartesia10x concurrency limit for WebSocket connections

Error Codes

CodeDescription
1000Normal closure
4000Authentication failed
4001Rate limit exceeded
4002Message too large

Additional Resources


Support

  • Email: [email protected]
  • Issues: Include session ID and timestamp
  • Feature requests: Contact your account manager


HTTP vs WebSocket Limits

Limits by Account Tier

Free Tier

FeatureHTTP APIWebSocket API
Concurrent requests/connections51
Max text length1,000 chars500 chars/message
Max audio duration60 secondsN/A (streaming)
Max input tokens (LLM)4,000N/A
Max output tokens (LLM)4,000N/A
Requests/Messages per minute3020
Max connection durationRequest-based5 minutes
Max message sizeRequest body limits16KB

Pro Tier ($49/month)

FeatureHTTP APIWebSocket API
Concurrent requests/connections255
Max text length5,000 chars2,000 chars/message
Max audio duration300 secondsN/A (streaming)
Max input tokens (LLM)32,000N/A
Max output tokens (LLM)32,000N/A
Requests/Messages per minute30060
Max connection durationRequest-based30 minutes
Max message sizeRequest body limits32KB

Enterprise Tier

FeatureHTTP APIWebSocket API
Concurrent requests/connections10050
Max text length10,000 chars10,000 chars/message
Max audio duration600 secondsN/A (streaming)
Max input tokens (LLM)128,000N/A
Max output tokens (LLM)128,000N/A
Requests/Messages per minute1,000120
Max connection durationRequest-based120 minutes
Max message sizeRequest body limits128KB

WebSocket APIs - Real-time AI at scale

Last modified on