WebSocket Reference

WebSocket Unified Protocol Guide

🎯 Important: You Have Two Options!

Option 1: Use Provider-Native Format (If you prefer)

Each provider's original API still works exactly as before. No changes needed to existing code!

Option 2: Use Unified Protocol (Write once, run anywhere)

A single API that works across all providers. Same code for everyone!

What is the Unified Protocol?

The unified protocol is an OPTIONAL standard WebSocket API that works across all TTS providers. You can use it for portability, or stick with provider-native formats - both work!

The Problem It Solves (If You Want It Solved)

Before (Provider-Specific):

JavascriptCode
// Different code for each provider 😞 if (provider === 'orpheus') { ws.send('Plain text'); ws.send(JSON.stringify({type: 'control', action: 'flush'})); } else if (provider === 'elevenlabs') { ws.send(JSON.stringify({text: 'Hello', flush: true})); } else if (provider === 'deepgram') { ws.send('Hello'); ws.send(JSON.stringify({type: 'Flush'})); }

After (Unified):

JavascriptCode
// Same code for ALL providers 🎉 ws.send(JSON.stringify({type: 'input_text', text: 'Hello'})); ws.send(JSON.stringify({type: 'flush'}));

Core Concepts

1. Standard Commands

Every provider accepts these commands (even if they don't support them):

  • init - Initialize session
  • input_text - Send text to synthesize
  • flush - Force synthesis
  • stop - Interrupt (where supported)
  • clear - Clear buffer (where supported)
  • configure - Update settings (where supported)

2. Graceful Degradation

Unsupported commands are silently ignored, not errors:

  • Send stop to ElevenLabs → Ignored, no error
  • Send configure to Orpheus → Ignored, no error
  • Your code keeps working!

3. Unified Parameters

Use the same parameter names everywhere:

  • voice_id (not voice, model, or voice.id)
  • codec (not encoding or format)
  • sample_rate_hz (not sample_rate or sampleRate)

Migration Guide

From SLNG-Hosted Models (Orpheus, Kokoro, etc.)

Old WayNew Way
?voice=tara?voice_id=tara
"Plain text string"{type: 'input_text', text: '...'}
{type: 'control', action: 'flush'}{type: 'flush'}
{type: 'control', action: 'stop'}{type: 'stop'}

Before:

JavascriptCode
ws.send('Hello world'); ws.send(JSON.stringify({type: 'control', action: 'flush'}));

After:

JavascriptCode
ws.send(JSON.stringify({type: 'input_text', text: 'Hello world'})); ws.send(JSON.stringify({type: 'flush'}));

From ElevenLabs

Old WayNew Way
{text: '...'}{type: 'input_text', text: '...'}
{text: '', flush: true}{type: 'flush'}
voice_settingsstyle in init

Before:

JavascriptCode
ws.send(JSON.stringify({ text: 'Hello', voice_settings: {stability: 0.5} }));

After:

JavascriptCode
ws.send(JSON.stringify({type: 'init', style: {stability: 0.5}})); ws.send(JSON.stringify({type: 'input_text', text: 'Hello'}));

From Deepgram

Old WayNew Way
?model=aura-asteria-envoice_id: 'aura-asteria-en'
?encoding=linear16codec: 'pcm16'
"Plain text"{type: 'input_text', text: '...'}
{type: 'Flush'}{type: 'flush'}

From Cartesia

Old WayNew Way
{transcript: '...'}{type: 'input_text', text: '...'}
context_id in each messagecontext_id in init only
continue: false{type: 'flush'}

Best Practices

✅ DO

  1. Use unified commands for all new code

    JavascriptCode
    // Good - works everywhere ws.send(JSON.stringify({type: 'input_text', text: 'Hello'}));
  2. Initialize once at the beginning

    JavascriptCode
    ws.send(JSON.stringify({ type: 'init', voice_id: 'tara', region: 'us-west' }));
  3. Use graceful degradation to your advantage

    JavascriptCode
    // Send advanced commands - they work where supported ws.send(JSON.stringify({type: 'stop'})); // Works on some ws.send(JSON.stringify({type: 'configure'})); // Works on others // No need for provider checks!

❌ DON'T

  1. Don't mix unified and legacy in the same session

    JavascriptCode
    // Bad - confusing ws.send(JSON.stringify({type: 'input_text', text: 'Hello'})); ws.send('Plain text'); // Don't mix formats
  2. Don't check provider before sending commands

    JavascriptCode
    // Unnecessary - graceful degradation handles this if (provider === 'orpheus') { ws.send(JSON.stringify({type: 'stop'})); }
  3. Don't assume all features work everywhere

    JavascriptCode
    // Check capabilities from ready event instead if (capabilities.supports_configure) { // Now you know it's supported }

Why This Is Actually Simpler

Before: Learn 5+ Different APIs

  • Orpheus: Plain text + control commands
  • ElevenLabs: JSON with text field
  • Deepgram: Plain text with Flush command
  • Cartesia: Transcript with context_id
  • Future providers: ???

After: Learn ONE API

  • initinput_textflush
  • Same commands everywhere
  • New providers automatically work

The Power of Write Once, Run Anywhere

JavascriptCode
class UniversalTTS { synthesize(text) { this.ws.send(JSON.stringify({type: 'input_text', text})); this.ws.send(JSON.stringify({type: 'flush'})); } } // This SAME class works with: const orpheus = new UniversalTTS('orpheus-websocket-stream'); const elevenlabs = new UniversalTTS('elevenlabs-websocket-stream'); const futureModel = new UniversalTTS('future-websocket-stream');

Quick Reference

Essential Commands

JavascriptCode
// Initialize (once per session) {type: 'init', voice_id: 'voice-name'} // Send text {type: 'input_text', text: 'Your text here'} // Force synthesis {type: 'flush'}

Optional Commands (Use Where Supported)

JavascriptCode
// Stop synthesis {type: 'stop'} // Clear buffer {type: 'clear'} // Update settings {type: 'configure', voice_id: 'new-voice'}

Next Steps

Last modified on