WebSocket Provider Differences
🎯 Important: Two Ways to Connect
1. Provider-Native Format (Your existing code works!)
- Use each provider's original API format
- No changes needed to existing integrations
- Provider-specific features fully available
2. Unified Protocol (Optional portability)
- One format works with all providers
- Automatic translation happens behind the scenes
- Graceful degradation for unsupported features
Both approaches are fully supported. You can even mix them!
Unified Protocol
All providers accept the same standard commands and parameters:
Code
Command Support Matrix
Provider | flush | stop | clear | configure | setStyle |
---|---|---|---|---|---|
SLNG-Hosted Models | |||||
Orpheus | ✅ | ✅* | ✅ | ❌ | ❌ |
Orpheus Indic | ✅ | ✅* | ✅ | ❌ | ❌ |
Kokoro | ✅ | ✅* | ✅ | ❌ | ❌ |
CosyVoice | ✅ | ✅* | ✅ | ❌ | ❌ |
Chatterbox | ✅ | ✅* | ✅ | ❌ | ❌ |
XTTS-V2 | ✅ | ✅* | ✅ | ❌ | ❌ |
External Providers | |||||
ElevenLabs | ✅ | ❌ | ❌ | ✅ | ✅ |
Deepgram | ✅ | ❌ | ✅ | ❌ | ❌ |
Cartesia | ✅ | ❌ | ❌ | ✅ | ✅ |
*Note: The stop command for SLNG models is implemented via connection close workaround - the gateway closes the backend connection and sends an audio_end
event.
Provider-Specific Parameter Translation
SLNG-Hosted Models (Orpheus, Orpheus Indic, Kokoro*, CosyVoice, Chatterbox, XTTS)
*Note: Kokoro generates complete audio before streaming begins. The model processes the entire text, then chunks and streams the generated audio. This results in slightly higher initial latency but ensures consistent quality.
Unified → Native Translation:
Code
ElevenLabs
Unified → Native Translation:
Code
Deepgram
Unified → Native Translation:
Code
Cartesia
Unified → Native Translation:
Code
Command Behavior Differences
Flush Command
- SLNG-hosted: Sends
{type: "control", action: "flush"}
- forces synthesis without closing - ElevenLabs: Sends
{text: "", flush: true}
- completes current generation - Deepgram: Sends
{type: "Flush"}
- processes buffered text - Cartesia: Sends
{continue: false}
- finalizes current context
Stop Command
- SLNG-hosted: Gateway closes backend connection to stop generation immediately, sends
audio_end
event - ElevenLabs: ❌ Not supported (command ignored)
- Deepgram: ❌ Not supported (command ignored)
- Cartesia: ❌ Not supported (command ignored)
- Implementation: Uses connection close workaround since backend models don't respond to control commands
Clear Command
- SLNG-hosted: Sends
{type: "control", action: "clear"}
- clears buffer without synthesis - ElevenLabs: ❌ Not supported (command ignored)
- Deepgram: Sends
{type: "Clear"}
- clears internal buffer - Cartesia: ❌ Not supported (command ignored)
Error Handling
Unsupported commands are gracefully ignored - the client receives no error, but the command has no effect. This ensures backward compatibility when switching between providers.
Rate Limits
Account Tier Limits (Gateway-Level)
These limits apply to ALL providers:
Tier | Connections | Duration | Messages/min | Text/msg | Message Size |
---|---|---|---|---|---|
Free | 1 | 5 min | 20 | 500 chars | 16KB |
Pro ($49) | 5 | 30 min | 60 | 2K chars | 32KB |
Enterprise | 50 | 120 min | 120 | 10K chars | 128KB |
Provider-Specific Limits (Additional)
Provider | Limit | Notes |
---|---|---|
SLNG-Hosted Models | Gateway limits only | No additional provider limits |
ElevenLabs | API key based | Character limits by tier |
Deepgram | 2400 chars/min | 5 concurrent, 60-min max |
Cartesia | 10x concurrency | WebSocket connections = 10x your concurrency limit |
Audio Output Formats
All providers stream audio through our gateway in a standardized format:
Provider | Default Format | Sample Rate | Delivery Method | Default Voice |
---|---|---|---|---|
Orpheus | PCM 16-bit | 24kHz | base64 in JSON events | tara |
Orpheus Indic | PCM 16-bit | 24kHz | base64 in JSON events | kanak |
Kokoro | PCM 16-bit | 24kHz | base64 in JSON events (after full generation) | af |
CosyVoice | PCM 16-bit | 24kHz | base64 in JSON events | emma |
Chatterbox | PCM 16-bit | 24kHz | base64 in JSON events | nova |
XTTS V2 | PCM 16-bit | 24kHz | base64 in JSON events | Claribel Dervla |
ElevenLabs | MP3/PCM (configurable) | 22.05kHz-44.1kHz | base64 in JSON events* | |
Deepgram | Linear16 PCM | 8kHz-48kHz (configurable) | base64 in JSON events* | |
Cartesia | PCM 16-bit LE | 8kHz-44.1kHz (configurable) | base64 in JSON events* |
*Note: External providers send binary frames natively, but our gateway converts them to base64-encoded JSON events for consistency with the unified protocol.
Best Practices
- Use unified parameters - They work across all providers
- Check command support - Refer to the matrix above
- Handle graceful degradation - Unsupported commands are ignored
- Monitor rate limits - Each provider has different limits
- Test with different providers - Behavior may vary slightly despite unified interface