WebSocket Reference

WebSocket Unified Protocol Guide

🎯 Important: You Have Two Options!

Option 1: Use Provider-Native Format (If you prefer)

Each provider's original API still works exactly as before. No changes needed to existing code!

Option 2: Use Unified Protocol (Write once, run anywhere)

A single API that works across all providers. Same code for everyone!

What is the Unified Protocol?

The unified protocol is an OPTIONAL standard WebSocket API that works across all TTS providers. You can use it for portability, or stick with provider-native formats - both work!

The Problem It Solves (If You Want It Solved)

Before (Provider-Specific):

Code
 
// Different code for each provider 😞
if (provider === 'orpheus') {
  ws.send('Plain text');
  ws.send(JSON.stringify({type: 'control', action: 'flush'}));
} else if (provider === 'elevenlabs') {
  ws.send(JSON.stringify({text: 'Hello', flush: true}));
} else if (provider === 'deepgram') {
  ws.send('Hello');
  ws.send(JSON.stringify({type: 'Flush'}));
}

After (Unified):

Code
 
// Same code for ALL providers 🎉
ws.send(JSON.stringify({type: 'input_text', text: 'Hello'}));
ws.send(JSON.stringify({type: 'flush'}));

Core Concepts

1. Standard Commands

Every provider accepts these commands (even if they don't support them):

init - Initialize session
input_text - Send text to synthesize
flush - Force synthesis
stop - Interrupt (where supported)
clear - Clear buffer (where supported)
configure - Update settings (where supported)

2. Graceful Degradation

Unsupported commands are silently ignored, not errors:

Send stop to ElevenLabs → Ignored, no error
Send configure to Orpheus → Ignored, no error
Your code keeps working!

3. Unified Parameters

Use the same parameter names everywhere:

voice_id (not voice, model, or voice.id)
codec (not encoding or format)
sample_rate_hz (not sample_rate or sampleRate)

Migration Guide

From SLNG-Hosted Models (Orpheus, Kokoro, etc.)

Old Way	New Way
`?voice=tara`	`?voice_id=tara`
`"Plain text string"`	`{type: 'input_text', text: '...'}`
`{type: 'control', action: 'flush'}`	`{type: 'flush'}`
`{type: 'control', action: 'stop'}`	`{type: 'stop'}`

Before:

Code
 
ws.send('Hello world');
ws.send(JSON.stringify({type: 'control', action: 'flush'}));

After:

Code
 
ws.send(JSON.stringify({type: 'input_text', text: 'Hello world'}));
ws.send(JSON.stringify({type: 'flush'}));

From ElevenLabs

Old Way	New Way
`{text: '...'}`	`{type: 'input_text', text: '...'}`
`{text: '', flush: true}`	`{type: 'flush'}`
`voice_settings`	`style` in init

Before:

Code
 
ws.send(JSON.stringify({
  text: 'Hello',
  voice_settings: {stability: 0.5}
}));

After:

Code
 
ws.send(JSON.stringify({type: 'init', style: {stability: 0.5}}));
ws.send(JSON.stringify({type: 'input_text', text: 'Hello'}));

From Deepgram

Old Way	New Way
`?model=aura-asteria-en`	`voice_id: 'aura-asteria-en'`
`?encoding=linear16`	`codec: 'pcm16'`
`"Plain text"`	`{type: 'input_text', text: '...'}`
`{type: 'Flush'}`	`{type: 'flush'}`

From Cartesia

Old Way	New Way
`{transcript: '...'}`	`{type: 'input_text', text: '...'}`
`context_id` in each message	`context_id` in init only
`continue: false`	`{type: 'flush'}`

Best Practices

✅ DO

Use unified commands for all new code

Code
 
// Good - works everywhere
ws.send(JSON.stringify({type: 'input_text', text: 'Hello'}));

Initialize once at the beginning

Code
 
ws.send(JSON.stringify({
  type: 'init',
  voice_id: 'tara',
  region: 'us-west'
}));

Use graceful degradation to your advantage

Code
 
// Send advanced commands - they work where supported
ws.send(JSON.stringify({type: 'stop'}));      // Works on some
ws.send(JSON.stringify({type: 'configure'})); // Works on others
// No need for provider checks!

❌ DON'T

Don't mix unified and legacy in the same session

Code
 
// Bad - confusing
ws.send(JSON.stringify({type: 'input_text', text: 'Hello'}));
ws.send('Plain text'); // Don't mix formats

Don't check provider before sending commands

Code
 
// Unnecessary - graceful degradation handles this
if (provider === 'orpheus') {
  ws.send(JSON.stringify({type: 'stop'}));
}

Don't assume all features work everywhere

Code
 
// Check capabilities from ready event instead
if (capabilities.supports_configure) {
  // Now you know it's supported
}

Why This Is Actually Simpler

Before: Learn 5+ Different APIs

Orpheus: Plain text + control commands
ElevenLabs: JSON with text field
Deepgram: Plain text with Flush command
Cartesia: Transcript with context_id
Future providers: ???

After: Learn ONE API

init → input_text → flush
Same commands everywhere
New providers automatically work

The Power of Write Once, Run Anywhere

Code
 
class UniversalTTS {
  synthesize(text) {
    this.ws.send(JSON.stringify({type: 'input_text', text}));
    this.ws.send(JSON.stringify({type: 'flush'}));
  }
}

// This SAME class works with:
const orpheus = new UniversalTTS('orpheus-websocket-stream');
const elevenlabs = new UniversalTTS('elevenlabs-websocket-stream');
const futureModel = new UniversalTTS('future-websocket-stream');

Quick Reference

Essential Commands

Code
 
// Initialize (once per session)
{type: 'init', voice_id: 'voice-name'}

// Send text
{type: 'input_text', text: 'Your text here'}

// Force synthesis
{type: 'flush'}

Optional Commands (Use Where Supported)

Code
 
// Stop synthesis
{type: 'stop'}

// Clear buffer
{type: 'clear'}

// Update settings
{type: 'configure', voice_id: 'new-voice'}

Next Steps

API Reference - Complete technical details
Examples - Working code examples
Provider Matrix - See what's supported where

Last modified on October 28, 2025

Complete Reference Provider Matrix