WebSocket Reference

WebSocket Provider Differences

🎯 Important: Two Ways to Connect

1. Provider-Native Format (Your existing code works!)

  • Use each provider's original API format
  • No changes needed to existing integrations
  • Provider-specific features fully available

2. Unified Protocol (Optional portability)

  • One format works with all providers
  • Automatic translation happens behind the scenes
  • Graceful degradation for unsupported features

Both approaches are fully supported. You can even mix them!

Unified Protocol

All providers accept the same standard commands and parameters:

JavascriptCode
// Standard initialization { type: 'init', model_id: 'model-name', voice_id: 'voice-name', region: 'us-west', audio: { codec: 'pcm16', sample_rate_hz: 24000 }, style: { temperature: 0.6, stability: 0.5 } } // Standard text input { type: 'input_text', id: 'msg-1', text: 'Hello world' } // Standard commands {type: 'flush'} // Force synthesis {type: 'stop'} // Interrupt synthesis {type: 'clear'} // Clear buffer

Command Support Matrix

ProviderflushstopclearconfiguresetStyle
SLNG-Hosted Models
Orpheus✅*
Orpheus Indic✅*
Kokoro✅*
CosyVoice✅*
Chatterbox✅*
XTTS-V2✅*
External Providers
ElevenLabs
Deepgram
Cartesia

*Note: The stop command for SLNG models is implemented via connection close workaround - the gateway closes the backend connection and sends an audio_end event.

Provider-Specific Parameter Translation

SLNG-Hosted Models (Orpheus, Orpheus Indic, Kokoro*, CosyVoice, Chatterbox, XTTS)

*Note: Kokoro generates complete audio before streaming begins. The model processes the entire text, then chunks and streams the generated audio. This results in slightly higher initial latency but ensures consistent quality.

Unified → Native Translation:

JavascriptCode
// Unified { voice_id: 'nova', style: { temperature: 0.8, top_p: 0.9, repetition_penalty: 1.2 }, max_tokens: 2000, buffer_size: 15 } // Translated to SLNG-hosted format { voice: 'nova', temperature: 0.8, top_p: 0.9, repetition_penalty: 1.2, max_tokens: 2000, buffer_size: 15 }

ElevenLabs

Unified → Native Translation:

JavascriptCode
// Unified { voice_id: '21m00Tcm4TlvDq8ikWAM', style: { stability: 0.6, similarity_boost: 0.8, style: 0.2, use_speaker_boost: true }, chunk_length_schedule: [120, 160, 250] } // Translated to ElevenLabs format { voice_settings: { stability: 0.6, similarity_boost: 0.8, style: 0.2, use_speaker_boost: true }, generation_config: { chunk_length_schedule: [120, 160, 250] } }

Deepgram

Unified → Native Translation:

JavascriptCode
// Unified { model_id: 'aura-asteria-en', audio: { encoding: 'linear16', sample_rate_hz: 24000, container: 'none' } } // Translated to Deepgram URL params // wss://api.deepgram.com/v1/speak?model=aura-asteria-en&encoding=linear16&sample_rate=24000

Cartesia

Unified → Native Translation:

JavascriptCode
// Unified { model_id: 'sonic-english', voice_id: '694f9389-aac1-45b6-b726-9d9369183238', language: 'en' } // Translated to Cartesia format { context_id: 'uuid-generated', model_id: 'sonic-english', voice: { mode: 'id', id: '694f9389-aac1-45b6-b726-9d9369183238' }, output_format: { container: 'raw', encoding: 'pcm_f32le', sample_rate: 24000 }, language: 'en' }

Command Behavior Differences

Flush Command

  • SLNG-hosted: Sends {type: "control", action: "flush"} - forces synthesis without closing
  • ElevenLabs: Sends {text: "", flush: true} - completes current generation
  • Deepgram: Sends {type: "Flush"} - processes buffered text
  • Cartesia: Sends {continue: false} - finalizes current context

Stop Command

  • SLNG-hosted: Gateway closes backend connection to stop generation immediately, sends audio_end event
  • ElevenLabs: ❌ Not supported (command ignored)
  • Deepgram: ❌ Not supported (command ignored)
  • Cartesia: ❌ Not supported (command ignored)
  • Implementation: Uses connection close workaround since backend models don't respond to control commands

Clear Command

  • SLNG-hosted: Sends {type: "control", action: "clear"} - clears buffer without synthesis
  • ElevenLabs: ❌ Not supported (command ignored)
  • Deepgram: Sends {type: "Clear"} - clears internal buffer
  • Cartesia: ❌ Not supported (command ignored)

Error Handling

Unsupported commands are gracefully ignored - the client receives no error, but the command has no effect. This ensures backward compatibility when switching between providers.

Rate Limits

Account Tier Limits (Gateway-Level)

These limits apply to ALL providers:

TierConnectionsDurationMessages/minText/msgMessage Size
Free15 min20500 chars16KB
Pro ($49)530 min602K chars32KB
Enterprise50120 min12010K chars128KB

Provider-Specific Limits (Additional)

ProviderLimitNotes
SLNG-Hosted ModelsGateway limits onlyNo additional provider limits
ElevenLabsAPI key basedCharacter limits by tier
Deepgram2400 chars/min5 concurrent, 60-min max
Cartesia10x concurrencyWebSocket connections = 10x your concurrency limit

Audio Output Formats

All providers stream audio through our gateway in a standardized format:

ProviderDefault FormatSample RateDelivery MethodDefault Voice
OrpheusPCM 16-bit24kHzbase64 in JSON eventstara
Orpheus IndicPCM 16-bit24kHzbase64 in JSON eventskanak
KokoroPCM 16-bit24kHzbase64 in JSON events (after full generation)af
CosyVoicePCM 16-bit24kHzbase64 in JSON eventsemma
ChatterboxPCM 16-bit24kHzbase64 in JSON eventsnova
XTTS V2PCM 16-bit24kHzbase64 in JSON eventsClaribel Dervla
ElevenLabsMP3/PCM (configurable)22.05kHz-44.1kHzbase64 in JSON events*
DeepgramLinear16 PCM8kHz-48kHz (configurable)base64 in JSON events*
CartesiaPCM 16-bit LE8kHz-44.1kHz (configurable)base64 in JSON events*

*Note: External providers send binary frames natively, but our gateway converts them to base64-encoded JSON events for consistency with the unified protocol.

Best Practices

  1. Use unified parameters - They work across all providers
  2. Check command support - Refer to the matrix above
  3. Handle graceful degradation - Unsupported commands are ignored
  4. Monitor rate limits - Each provider has different limits
  5. Test with different providers - Behavior may vary slightly despite unified interface
Last modified on