WebSocket Reference

WebSocket Provider Differences

🎯 Important: Two Ways to Connect

1. Provider-Native Format (Your existing code works!)

Use each provider's original API format
No changes needed to existing integrations
Provider-specific features fully available

2. Unified Protocol (Optional portability)

One format works with all providers
Automatic translation happens behind the scenes
Graceful degradation for unsupported features

Both approaches are fully supported. You can even mix them!

Unified Protocol

All providers accept the same standard commands and parameters:

Code
 
// Standard initialization
{
  type: 'init',
  model_id: 'model-name',
  voice_id: 'voice-name', 
  region: 'us-west',
  audio: {
    codec: 'pcm16',
    sample_rate_hz: 24000
  },
  style: {
    temperature: 0.6,
    stability: 0.5
  }
}

// Standard text input
{
  type: 'input_text',
  id: 'msg-1',
  text: 'Hello world'
}

// Standard commands
{type: 'flush'}    // Force synthesis
{type: 'stop'}     // Interrupt synthesis
{type: 'clear'}    // Clear buffer

Command Support Matrix

Provider	flush	stop	clear	configure	setStyle
SLNG-Hosted Models
Orpheus	✅	✅*	✅	❌	❌
Orpheus Indic	✅	✅*	✅	❌	❌
Kokoro	✅	✅*	✅	❌	❌
CosyVoice	✅	✅*	✅	❌	❌
Chatterbox	✅	✅*	✅	❌	❌
External Providers
ElevenLabs	✅	❌	❌	✅	✅
Deepgram	✅	❌	✅	❌	❌
Cartesia	✅	❌	❌	✅	✅

*Note: The stop command for SLNG models is implemented via connection close workaround - the gateway closes the backend connection and sends an audio_end event.

Provider-Specific Parameter Translation

SLNG-Hosted Models (Orpheus, Orpheus Indic, Kokoro*, CosyVoice, Chatterbox, XTTS)

*Note: Kokoro generates complete audio before streaming begins. The model processes the entire text, then chunks and streams the generated audio. This results in slightly higher initial latency but ensures consistent quality.

Unified → Native Translation:

Code
 
// Unified
{
  voice_id: 'nova',
  style: {
    temperature: 0.8,
    top_p: 0.9,
    repetition_penalty: 1.2
  },
  max_tokens: 2000,
  buffer_size: 15
}

// Translated to SLNG-hosted format
{
  voice: 'nova',
  temperature: 0.8,
  top_p: 0.9,
  repetition_penalty: 1.2,
  max_tokens: 2000,
  buffer_size: 15
}

ElevenLabs

Unified → Native Translation:

Code
 
// Unified
{
  voice_id: '21m00Tcm4TlvDq8ikWAM',
  style: {
    stability: 0.6,
    similarity_boost: 0.8,
    style: 0.2,
    use_speaker_boost: true
  },
  chunk_length_schedule: [120, 160, 250]
}

// Translated to ElevenLabs format
{
  voice_settings: {
    stability: 0.6,
    similarity_boost: 0.8,
    style: 0.2,
    use_speaker_boost: true
  },
  generation_config: {
    chunk_length_schedule: [120, 160, 250]
  }
}

Deepgram

Unified → Native Translation:

Code
 
// Unified
{
  model_id: 'aura-asteria-en',
  audio: {
    encoding: 'linear16',
    sample_rate_hz: 24000,
    container: 'none'
  }
}

// Translated to Deepgram URL params
// wss://api.deepgram.com/v1/speak?model=aura-asteria-en&encoding=linear16&sample_rate=24000

Cartesia

Unified → Native Translation:

Code
 
// Unified
{
  model_id: 'sonic-english',
  voice_id: '694f9389-aac1-45b6-b726-9d9369183238',
  language: 'en'
}

// Translated to Cartesia format
{
  context_id: 'uuid-generated',
  model_id: 'sonic-english',
  voice: {
    mode: 'id',
    id: '694f9389-aac1-45b6-b726-9d9369183238'
  },
  output_format: {
    container: 'raw',
    encoding: 'pcm_f32le',
    sample_rate: 24000
  },
  language: 'en'
}

Command Behavior Differences

Flush Command

SLNG-hosted: Sends {type: "control", action: "flush"} - forces synthesis without closing
ElevenLabs: Sends {text: "", flush: true} - completes current generation
Deepgram: Sends {type: "Flush"} - processes buffered text
Cartesia: Sends {continue: false} - finalizes current context

Stop Command

SLNG-hosted: Gateway closes backend connection to stop generation immediately, sends audio_end event
ElevenLabs: ❌ Not supported (command ignored)
Deepgram: ❌ Not supported (command ignored)
Cartesia: ❌ Not supported (command ignored)
Implementation: Uses connection close workaround since backend models don't respond to control commands

Clear Command

SLNG-hosted: Sends {type: "control", action: "clear"} - clears buffer without synthesis
ElevenLabs: ❌ Not supported (command ignored)
Deepgram: Sends {type: "Clear"} - clears internal buffer
Cartesia: ❌ Not supported (command ignored)

Error Handling

Unsupported commands are gracefully ignored - the client receives no error, but the command has no effect. This ensures backward compatibility when switching between providers.

Rate Limits

Account Tier Limits (Gateway-Level)

These limits apply to ALL providers:

Tier	Connections	Duration	Messages/min	Text/msg	Message Size
Free	1	5 min	20	500 chars	16KB
Pro ($49)	5	30 min	60	2K chars	32KB
Enterprise	50	120 min	120	10K chars	128KB

Provider-Specific Limits (Additional)

Provider	Limit	Notes
SLNG-Hosted Models	Gateway limits only	No additional provider limits
ElevenLabs	API key based	Character limits by tier
Deepgram	2400 chars/min	5 concurrent, 60-min max
Cartesia	10x concurrency	WebSocket connections = 10x your concurrency limit

Audio Output Formats

All providers stream audio through our gateway in a standardized format:

Provider	Default Format	Sample Rate	Delivery Method	Default Voice
Orpheus	PCM 16-bit	24kHz	base64 in JSON events	tara
Orpheus Indic	PCM 16-bit	24kHz	base64 in JSON events	kanak
Kokoro	PCM 16-bit	24kHz	base64 in JSON events (after full generation)	af
CosyVoice	PCM 16-bit	24kHz	base64 in JSON events	emma
Chatterbox	PCM 16-bit	24kHz	base64 in JSON events	nova
XTTS V2	PCM 16-bit	24kHz	base64 in JSON events	Claribel Dervla
ElevenLabs	MP3/PCM (configurable)	22.05kHz-44.1kHz	base64 in JSON events*
Deepgram	Linear16 PCM	8kHz-48kHz (configurable)	base64 in JSON events*
Cartesia	PCM 16-bit LE	8kHz-44.1kHz (configurable)	base64 in JSON events*

*Note: External providers send binary frames natively, but our gateway converts them to base64-encoded JSON events for consistency with the unified protocol.

Best Practices

Use unified parameters - They work across all providers
Check command support - Refer to the matrix above
Handle graceful degradation - Unsupported commands are ignored
Monitor rate limits - Each provider has different limits
Test with different providers - Behavior may vary slightly despite unified interface

Last modified on October 28, 2025

Protocol Migration Code Examples