WebSocket Unified Protocol Guide
🎯 Important: You Have Two Options!
Option 1: Use Provider-Native Format (If you prefer)
Each provider's original API still works exactly as before. No changes needed to existing code!
Option 2: Use Unified Protocol (Write once, run anywhere)
A single API that works across all providers. Same code for everyone!
What is the Unified Protocol?
The unified protocol is an OPTIONAL standard WebSocket API that works across all TTS providers. You can use it for portability, or stick with provider-native formats - both work!
The Problem It Solves (If You Want It Solved)
Before (Provider-Specific):
Code
After (Unified):
Code
Core Concepts
1. Standard Commands
Every provider accepts these commands (even if they don't support them):
init
- Initialize sessioninput_text
- Send text to synthesizeflush
- Force synthesisstop
- Interrupt (where supported)clear
- Clear buffer (where supported)configure
- Update settings (where supported)
2. Graceful Degradation
Unsupported commands are silently ignored, not errors:
- Send
stop
to ElevenLabs → Ignored, no error - Send
configure
to Orpheus → Ignored, no error - Your code keeps working!
3. Unified Parameters
Use the same parameter names everywhere:
voice_id
(notvoice
,model
, orvoice.id
)codec
(notencoding
orformat
)sample_rate_hz
(notsample_rate
orsampleRate
)
Migration Guide
From SLNG-Hosted Models (Orpheus, Kokoro, etc.)
Old Way | New Way |
---|---|
?voice=tara | ?voice_id=tara |
"Plain text string" | {type: 'input_text', text: '...'} |
{type: 'control', action: 'flush'} | {type: 'flush'} |
{type: 'control', action: 'stop'} | {type: 'stop'} |
Before:
Code
After:
Code
From ElevenLabs
Old Way | New Way |
---|---|
{text: '...'} | {type: 'input_text', text: '...'} |
{text: '', flush: true} | {type: 'flush'} |
voice_settings | style in init |
Before:
Code
After:
Code
From Deepgram
Old Way | New Way |
---|---|
?model=aura-asteria-en | voice_id: 'aura-asteria-en' |
?encoding=linear16 | codec: 'pcm16' |
"Plain text" | {type: 'input_text', text: '...'} |
{type: 'Flush'} | {type: 'flush'} |
From Cartesia
Old Way | New Way |
---|---|
{transcript: '...'} | {type: 'input_text', text: '...'} |
context_id in each message | context_id in init only |
continue: false | {type: 'flush'} |
Best Practices
✅ DO
-
Use unified commands for all new code
Code -
Initialize once at the beginning
Code -
Use graceful degradation to your advantage
Code
❌ DON'T
-
Don't mix unified and legacy in the same session
Code -
Don't check provider before sending commands
Code -
Don't assume all features work everywhere
Code
Why This Is Actually Simpler
Before: Learn 5+ Different APIs
- Orpheus: Plain text + control commands
- ElevenLabs: JSON with text field
- Deepgram: Plain text with Flush command
- Cartesia: Transcript with context_id
- Future providers: ???
After: Learn ONE API
init
→input_text
→flush
- Same commands everywhere
- New providers automatically work
The Power of Write Once, Run Anywhere
Code
Quick Reference
Essential Commands
Code
Optional Commands (Use Where Supported)
Code
Next Steps
- API Reference - Complete technical details
- Examples - Working code examples
- Provider Matrix - See what's supported where