Documentation

WebSocket API

Real-time bidirectional streaming for TTS and STT services. WebSockets provide the lowest latency and most interactive experience for voice applications.

Why WebSockets?

Lowest Latency: Sub-100ms response times
Bidirectional: Send and receive data simultaneously
Real-time Feedback: Get immediate responses
Efficient: Single persistent connection
Binary Support: Stream raw audio data

SLNG Unified Protocol

All SLNG WebSocket endpoints use a consistent protocol, regardless of the underlying provider. This ensures easy switching between models without changing your integration.

TTS WebSocket Protocol

Connection

Code
 
WSS wss://api.slng.ai/v1/tts/{provider}/{model}:{variant}

Available Endpoints:

wss://api.slng.ai/v1/tts/deepgram/aura:2
wss://api.slng.ai/v1/tts/elevenlabs/eleven-flash:2.5
wss://api.slng.ai/v1/tts/slng/canopylabs/orpheus:en

Client → Server Messages

Initialize Session


Code
 
{
  "type": "init",
  "config": {
    "encoding": "linear16",
    "sample_rate": 24000,
    "language": "en"
  }
}

Send Text for Synthesis


Code
 
{
  "type": "speak",
  "text": "Text to convert to speech",
  "flush": false
}

Parameters:

text (required): Text to synthesize
flush (optional): Whether to flush remaining audio immediately

Flush Buffer

Forces the server to send any remaining audio:


Code
 
{
  "type": "flush"
}

Clear Buffer

Clears any pending audio in the buffer:


Code
 
{
  "type": "clear"
}

Cancel Generation

Cancels current generation:


Code
 
{
  "type": "cancel"
}

Server → Client Messages

Session Ready


Code
 
{
  "type": "ready",
  "session_id": "uuid-here"
}

Audio Data (Binary)

Audio data is sent as binary ArrayBuffer frames. Each frame contains:

Raw PCM audio samples
Encoding as specified in config
Sample rate as specified in config

Flushed Acknowledgment


Code
 
{
  "type": "flushed"
}

Error


Code
 
{
  "type": "error",
  "code": "error_code",
  "message": "Human-readable error description"
}

Common Error Codes:

auth_error: Invalid or missing API key
config_error: Invalid configuration
rate_limit: Too many requests
provider_error: Upstream provider error

STT WebSocket Protocol

Connection

Code
 
WSS wss://api.slng.ai/v1/stt/{provider}/{model}:{variant}

Available Endpoints:

wss://api.slng.ai/v1/stt/deepgram/nova:2
wss://api.slng.ai/v1/stt/slng/openai/whisper:large-v3

Client → Server Messages

Initialize Session


Code
 
{
  "type": "init",
  "config": {
    "language": "en",
    "sample_rate": 16000,
    "encoding": "linear16"
  }
}

Send Audio Data (Binary)

After initialization, send audio data as binary WebSocket frames. The audio should be:

Raw PCM samples
Sample rate matching config
Encoding matching config (typically linear16)


Code
 
// Send raw audio bytes
ws.send(audioBuffer);

Finalize Transcription

Signal that audio stream is complete:


Code
 
{
  "type": "finalize"
}

Server → Client Messages

Session Ready


Code
 
{
  "type": "ready",
  "session_id": "uuid-here"
}

Transcript (Interim or Final)


Code
 
{
  "type": "transcript",
  "text": "Transcribed text",
  "is_final": false,
  "confidence": 0.95,
  "words": [
    {
      "text": "word",
      "start": 0.0,
      "end": 0.5,
      "confidence": 0.98
    }
  ]
}

Fields:

text: Transcribed text
is_final: Whether this is the final transcription
confidence: Overall confidence score (0-1)
words: Word-level timestamps and confidence (optional)

Error


Code
 
{
  "type": "error",
  "code": "error_code",
  "message": "Human-readable error description"
}

Complete Examples

JavaScript/TypeScript TTS


Code
 
const ws = new WebSocket("wss://api.slng.ai/v1/tts/deepgram/aura:2");

ws.onopen = () => {
  console.log("Connected");

  // Initialize session
  ws.send(
    JSON.stringify({
      type: "init",
      config: {
        encoding: "linear16",
        sample_rate: 24000,
      },
    }),
  );

  // Send text to convert
  ws.send(
    JSON.stringify({
      type: "speak",
      text: "Hello! This is a WebSocket TTS example.",
    }),
  );

  // Flush to get remaining audio
  ws.send(
    JSON.stringify({
      type: "flush",
    }),
  );
};

ws.onmessage = (event) => {
  if (event.data instanceof ArrayBuffer) {
    // Handle binary audio data
    playAudio(event.data);
  } else {
    // Handle JSON control messages
    const message = JSON.parse(event.data);
    console.log("Server message:", message);

    if (message.type === "ready") {
      console.log("Session ready:", message.session_id);
    } else if (message.type === "error") {
      console.error("Error:", message.message);
    }
  }
};

ws.onerror = (error) => {
  console.error("WebSocket error:", error);
};

ws.onclose = (event) => {
  console.log("Connection closed:", event.code, event.reason);
};

// Helper: Play audio from ArrayBuffer
function playAudio(audioData) {
  const audioContext = new AudioContext();
  const audioBuffer = audioContext.decodeAudioData(audioData);
  // ... play the audio
}

Python TTS


Code
 
import websocket
import json

def on_message(ws, message):
    if isinstance(message, bytes):
        # Handle binary audio data
        handle_audio(message)
    else:
        # Handle JSON control messages
        data = json.loads(message)
        print(f"Server message: {data}")

def on_open(ws):
    # Initialize session
    ws.send(json.dumps({
        "type": "init",
        "config": {
            "encoding": "linear16",
            "sample_rate": 24000
        }
    }))

    # Send text
    ws.send(json.dumps({
        "type": "speak",
        "text": "Hello from Python!"
    }))

    # Flush
    ws.send(json.dumps({"type": "flush"}))

ws = websocket.WebSocketApp(
    "wss://api.slng.ai/v1/tts/deepgram/aura:2",
    on_message=on_message,
    on_open=on_open
)

ws.run_forever()

JavaScript/TypeScript STT


Code
 
const ws = new WebSocket("wss://api.slng.ai/v1/stt/deepgram/nova:2");

ws.onopen = () => {
  console.log("Connected");

  // Initialize session
  ws.send(
    JSON.stringify({
      type: "init",
      config: {
        language: "en",
        sample_rate: 16000,
      },
    }),
  );

  // Start capturing audio from microphone
  startAudioCapture((audioChunk) => {
    ws.send(audioChunk); // Send raw audio bytes
  });
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);

  if (message.type === "transcript") {
    console.log("Transcript:", message.text);
    console.log("Is final:", message.is_final);
    console.log("Confidence:", message.confidence);

    if (message.is_final) {
      // This is the final transcription for this audio segment
      displayTranscript(message.text);
    } else {
      // This is an interim result
      showInterimTranscript(message.text);
    }
  }
};

ws.onclose = () => {
  console.log("Connection closed");
  stopAudioCapture();
};

Python STT


Code
 
import websocket
import json
import pyaudio

def on_message(ws, message):
    data = json.loads(message)

    if data["type"] == "transcript":
        print(f"Transcript: {data['text']}")
        print(f"Is final: {data['is_final']}")
        print(f"Confidence: {data['confidence']}")

def on_open(ws):
    # Initialize session
    ws.send(json.dumps({
        "type": "init",
        "config": {
            "language": "en",
            "sample_rate": 16000
        }
    }))

    # Stream audio from microphone
    audio = pyaudio.PyAudio()
    stream = audio.open(
        format=pyaudio.paInt16,
        channels=1,
        rate=16000,
        input=True,
        frames_per_buffer=1024
    )

    while True:
        data = stream.read(1024)
        ws.send(data, opcode=websocket.ABNF.OPCODE_BINARY)

ws = websocket.WebSocketApp(
    "wss://api.slng.ai/v1/stt/deepgram/nova:2",
    on_message=on_message,
    on_open=on_open
)

ws.run_forever()

Interruption & Cancellation (TTS)

For voice agents, handling interruptions is critical. When a user starts speaking, you need to stop the current TTS output immediately.

Interrupt Current Speech


Code
 
// Cancel server-side generation
ws.send(JSON.stringify({ type: "cancel" }));

// Clear local audio buffer
clearAudioQueue();

// Stop current playback
stopCurrentAudioSource();

Complete Voice Agent Pattern


Code
 
class InterruptibleVoiceAgent {
  constructor() {
    this.tts = new WebSocket("wss://api.slng.ai/v1/tts/deepgram/aura:2");
    this.stt = new WebSocket("wss://api.slng.ai/v1/stt/deepgram/nova:2");
    this.isAgentSpeaking = false;

    this.initializeTTS();
    this.initializeSTT();
  }

  initializeSTT() {
    this.stt.onopen = () => {
      this.stt.send(
        JSON.stringify({
          type: "init",
          config: { language: "en", sample_rate: 16000 },
        }),
      );
    };

    this.stt.onmessage = (event) => {
      const msg = JSON.parse(event.data);

      if (msg.type === "transcript" && msg.is_final) {
        // User spoke - interrupt agent if speaking
        if (this.isAgentSpeaking) {
          this.interruptAgent();
        }

        // Process user input
        this.handleUserInput(msg.text);
      }
    };
  }

  initializeTTS() {
    this.tts.onopen = () => {
      this.tts.send(
        JSON.stringify({
          type: "init",
          config: { encoding: "linear16", sample_rate: 24000 },
        }),
      );
    };

    this.tts.onmessage = (event) => {
      if (event.data instanceof ArrayBuffer) {
        this.playAudio(event.data);
      } else {
        const msg = JSON.parse(event.data);
        if (msg.type === "flushed") {
          this.isAgentSpeaking = false;
        }
      }
    };
  }

  interruptAgent() {
    // 1. Cancel server-side TTS
    this.tts.send(JSON.stringify({ type: "cancel" }));

    // 2. Clear local audio queue
    this.audioQueue = [];

    // 3. Stop current playback
    if (this.currentAudioSource) {
      this.currentAudioSource.stop();
      this.currentAudioSource = null;
    }

    this.isAgentSpeaking = false;
  }

  async handleUserInput(userText) {
    console.log("User said:", userText);

    // Get response from your LLM
    const response = await this.generateResponse(userText);

    // Speak response
    this.speak(response);
  }

  speak(text) {
    this.isAgentSpeaking = true;

    this.tts.send(
      JSON.stringify({
        type: "speak",
        text: text,
      }),
    );

    this.tts.send(
      JSON.stringify({
        type: "flush",
      }),
    );
  }

  async generateResponse(userText) {
    // Your LLM call here
    return "I understand you said: " + userText;
  }

  playAudio(audioData) {
    // Your audio playback implementation
  }
}

// Usage
const agent = new InterruptibleVoiceAgent();

See complete voice agent examples →

Best Practices

1. Connection Management

Implement Reconnection Logic:


Code
 
class ResilientWebSocket {
  constructor(url) {
    this.url = url;
    this.backoff = 1000;
    this.maxBackoff = 30000;
    this.connect();
  }

  connect() {
    this.ws = new WebSocket(this.url);

    this.ws.onclose = () => {
      console.log(`Reconnecting in ${this.backoff}ms`);
      setTimeout(() => this.connect(), this.backoff);
      this.backoff = Math.min(this.backoff * 2, this.maxBackoff);
    };

    this.ws.onopen = () => {
      console.log("Connected");
      this.backoff = 1000; // Reset backoff on successful connection
    };
  }
}

2. Error Handling

Always handle errors gracefully:


Code
 
ws.onerror = (error) => {
  console.error("WebSocket error:", error);
  // Notify user, attempt recovery
};

ws.onmessage = (event) => {
  if (typeof event.data === "string") {
    const message = JSON.parse(event.data);
    if (message.type === "error") {
      handleServerError(message.code, message.message);
    }
  }
};

3. Buffer Management (TTS)

Use flush strategically to control latency vs. quality:


Code
 
// For low latency: flush frequently
ws.send(JSON.stringify({ type: "speak", text: sentence, flush: true }));

// For better quality: let buffering happen
ws.send(JSON.stringify({ type: "speak", text: paragraph }));
// ... send more text ...
ws.send(JSON.stringify({ type: "flush" })); // Flush at the end

4. Audio Format Consistency

Ensure your audio format matches configuration:


Code
 
// Configuration
{
  encoding: 'linear16',    // 16-bit PCM
  sample_rate: 16000       // 16kHz
}

// Audio capture must match:
// - 16-bit samples
// - 16000 Hz sample rate
// - Single channel (mono)

5. Heartbeat/Keep-Alive

Implement ping/pong to keep connection alive:


Code
 
let pingInterval = setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: "ping" }));
  }
}, 30000); // Every 30 seconds

ws.onclose = () => {
  clearInterval(pingInterval);
};

Troubleshooting

Connection Drops

Problem: WebSocket disconnects unexpectedly

Solutions:

Implement reconnection logic with exponential backoff
Send periodic heartbeat/ping messages
Check firewall/proxy settings
Verify network stability

Audio Glitches (TTS)

Problem: Choppy or distorted audio playback

Solutions:

Implement proper audio buffering
Use WebAudio API for smooth playback
Check sample rate matches configuration
Ensure sufficient bandwidth

Delayed Transcriptions (STT)

Problem: Transcription results lag behind audio

Solutions:

Reduce audio chunk size for faster processing
Check network latency
Verify audio format matches configuration
Use appropriate model for real-time (Deepgram Nova)

Authentication Failures

Problem: Connection rejected with 401

Solutions:

Include Authorization header in connection request
Verify API key is valid and active
Check key hasn't expired
Ensure proper header format: Authorization: Bearer YOUR_KEY

Performance Tips

Reuse Connections: Keep WebSocket open for multiple operations
Batch Text: Send longer text chunks for better efficiency (TTS)
Buffer Audio: Stream audio in appropriate chunk sizes (STT)
Monitor Latency: Track end-to-end latency and optimize
Local Processing: Handle audio encoding/decoding client-side

Next Steps

Last modified on December 30, 2025

HTTP vs. SSE vs. WebSocket Voice Agents

Documentation

WebSocket API

Real-time bidirectional streaming for TTS and STT services. WebSockets provide the lowest latency and most interactive experience for voice applications.

Why WebSockets?

Lowest Latency: Sub-100ms response times
Bidirectional: Send and receive data simultaneously
Real-time Feedback: Get immediate responses
Efficient: Single persistent connection
Binary Support: Stream raw audio data

SLNG Unified Protocol

All SLNG WebSocket endpoints use a consistent protocol, regardless of the underlying provider. This ensures easy switching between models without changing your integration.

TTS WebSocket Protocol

Connection

Code
 
WSS wss://api.slng.ai/v1/tts/{provider}/{model}:{variant}

Available Endpoints:

wss://api.slng.ai/v1/tts/deepgram/aura:2
wss://api.slng.ai/v1/tts/elevenlabs/eleven-flash:2.5
wss://api.slng.ai/v1/tts/slng/canopylabs/orpheus:en

Client → Server Messages

Initialize Session


Code
 
{
  "type": "init",
  "config": {
    "encoding": "linear16",
    "sample_rate": 24000,
    "language": "en"
  }
}

Send Text for Synthesis


Code
 
{
  "type": "speak",
  "text": "Text to convert to speech",
  "flush": false
}

Parameters:

text (required): Text to synthesize
flush (optional): Whether to flush remaining audio immediately

Flush Buffer

Forces the server to send any remaining audio:


Code
 
{
  "type": "flush"
}

Clear Buffer

Clears any pending audio in the buffer:


Code
 
{
  "type": "clear"
}

Cancel Generation

Cancels current generation:


Code
 
{
  "type": "cancel"
}

Server → Client Messages

Session Ready


Code
 
{
  "type": "ready",
  "session_id": "uuid-here"
}

Audio Data (Binary)

Audio data is sent as binary ArrayBuffer frames. Each frame contains:

Raw PCM audio samples
Encoding as specified in config
Sample rate as specified in config

Flushed Acknowledgment


Code
 
{
  "type": "flushed"
}

Error


Code
 
{
  "type": "error",
  "code": "error_code",
  "message": "Human-readable error description"
}

Common Error Codes:

auth_error: Invalid or missing API key
config_error: Invalid configuration
rate_limit: Too many requests
provider_error: Upstream provider error

STT WebSocket Protocol

Connection

Code
 
WSS wss://api.slng.ai/v1/stt/{provider}/{model}:{variant}

Available Endpoints:

wss://api.slng.ai/v1/stt/deepgram/nova:2
wss://api.slng.ai/v1/stt/slng/openai/whisper:large-v3

Client → Server Messages

Initialize Session


Code
 
{
  "type": "init",
  "config": {
    "language": "en",
    "sample_rate": 16000,
    "encoding": "linear16"
  }
}

Send Audio Data (Binary)

After initialization, send audio data as binary WebSocket frames. The audio should be:

Raw PCM samples
Sample rate matching config
Encoding matching config (typically linear16)


Code
 
// Send raw audio bytes
ws.send(audioBuffer);

Finalize Transcription

Signal that audio stream is complete:


Code
 
{
  "type": "finalize"
}

Server → Client Messages

Session Ready


Code
 
{
  "type": "ready",
  "session_id": "uuid-here"
}

Transcript (Interim or Final)


Code
 
{
  "type": "transcript",
  "text": "Transcribed text",
  "is_final": false,
  "confidence": 0.95,
  "words": [
    {
      "text": "word",
      "start": 0.0,
      "end": 0.5,
      "confidence": 0.98
    }
  ]
}

Fields:

text: Transcribed text
is_final: Whether this is the final transcription
confidence: Overall confidence score (0-1)
words: Word-level timestamps and confidence (optional)

Error


Code
 
{
  "type": "error",
  "code": "error_code",
  "message": "Human-readable error description"
}

Complete Examples

JavaScript/TypeScript TTS


Code
 
const ws = new WebSocket("wss://api.slng.ai/v1/tts/deepgram/aura:2");

ws.onopen = () => {
  console.log("Connected");

  // Initialize session
  ws.send(
    JSON.stringify({
      type: "init",
      config: {
        encoding: "linear16",
        sample_rate: 24000,
      },
    }),
  );

  // Send text to convert
  ws.send(
    JSON.stringify({
      type: "speak",
      text: "Hello! This is a WebSocket TTS example.",
    }),
  );

  // Flush to get remaining audio
  ws.send(
    JSON.stringify({
      type: "flush",
    }),
  );
};

ws.onmessage = (event) => {
  if (event.data instanceof ArrayBuffer) {
    // Handle binary audio data
    playAudio(event.data);
  } else {
    // Handle JSON control messages
    const message = JSON.parse(event.data);
    console.log("Server message:", message);

    if (message.type === "ready") {
      console.log("Session ready:", message.session_id);
    } else if (message.type === "error") {
      console.error("Error:", message.message);
    }
  }
};

ws.onerror = (error) => {
  console.error("WebSocket error:", error);
};

ws.onclose = (event) => {
  console.log("Connection closed:", event.code, event.reason);
};

// Helper: Play audio from ArrayBuffer
function playAudio(audioData) {
  const audioContext = new AudioContext();
  const audioBuffer = audioContext.decodeAudioData(audioData);
  // ... play the audio
}

Python TTS


Code
 
import websocket
import json

def on_message(ws, message):
    if isinstance(message, bytes):
        # Handle binary audio data
        handle_audio(message)
    else:
        # Handle JSON control messages
        data = json.loads(message)
        print(f"Server message: {data}")

def on_open(ws):
    # Initialize session
    ws.send(json.dumps({
        "type": "init",
        "config": {
            "encoding": "linear16",
            "sample_rate": 24000
        }
    }))

    # Send text
    ws.send(json.dumps({
        "type": "speak",
        "text": "Hello from Python!"
    }))

    # Flush
    ws.send(json.dumps({"type": "flush"}))

ws = websocket.WebSocketApp(
    "wss://api.slng.ai/v1/tts/deepgram/aura:2",
    on_message=on_message,
    on_open=on_open
)

ws.run_forever()

JavaScript/TypeScript STT


Code
 
const ws = new WebSocket("wss://api.slng.ai/v1/stt/deepgram/nova:2");

ws.onopen = () => {
  console.log("Connected");

  // Initialize session
  ws.send(
    JSON.stringify({
      type: "init",
      config: {
        language: "en",
        sample_rate: 16000,
      },
    }),
  );

  // Start capturing audio from microphone
  startAudioCapture((audioChunk) => {
    ws.send(audioChunk); // Send raw audio bytes
  });
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);

  if (message.type === "transcript") {
    console.log("Transcript:", message.text);
    console.log("Is final:", message.is_final);
    console.log("Confidence:", message.confidence);

    if (message.is_final) {
      // This is the final transcription for this audio segment
      displayTranscript(message.text);
    } else {
      // This is an interim result
      showInterimTranscript(message.text);
    }
  }
};

ws.onclose = () => {
  console.log("Connection closed");
  stopAudioCapture();
};

Python STT


Code
 
import websocket
import json
import pyaudio

def on_message(ws, message):
    data = json.loads(message)

    if data["type"] == "transcript":
        print(f"Transcript: {data['text']}")
        print(f"Is final: {data['is_final']}")
        print(f"Confidence: {data['confidence']}")

def on_open(ws):
    # Initialize session
    ws.send(json.dumps({
        "type": "init",
        "config": {
            "language": "en",
            "sample_rate": 16000
        }
    }))

    # Stream audio from microphone
    audio = pyaudio.PyAudio()
    stream = audio.open(
        format=pyaudio.paInt16,
        channels=1,
        rate=16000,
        input=True,
        frames_per_buffer=1024
    )

    while True:
        data = stream.read(1024)
        ws.send(data, opcode=websocket.ABNF.OPCODE_BINARY)

ws = websocket.WebSocketApp(
    "wss://api.slng.ai/v1/stt/deepgram/nova:2",
    on_message=on_message,
    on_open=on_open
)

ws.run_forever()

Interruption & Cancellation (TTS)

For voice agents, handling interruptions is critical. When a user starts speaking, you need to stop the current TTS output immediately.

Interrupt Current Speech


Code
 
// Cancel server-side generation
ws.send(JSON.stringify({ type: "cancel" }));

// Clear local audio buffer
clearAudioQueue();

// Stop current playback
stopCurrentAudioSource();

Complete Voice Agent Pattern


Code
 
class InterruptibleVoiceAgent {
  constructor() {
    this.tts = new WebSocket("wss://api.slng.ai/v1/tts/deepgram/aura:2");
    this.stt = new WebSocket("wss://api.slng.ai/v1/stt/deepgram/nova:2");
    this.isAgentSpeaking = false;

    this.initializeTTS();
    this.initializeSTT();
  }

  initializeSTT() {
    this.stt.onopen = () => {
      this.stt.send(
        JSON.stringify({
          type: "init",
          config: { language: "en", sample_rate: 16000 },
        }),
      );
    };

    this.stt.onmessage = (event) => {
      const msg = JSON.parse(event.data);

      if (msg.type === "transcript" && msg.is_final) {
        // User spoke - interrupt agent if speaking
        if (this.isAgentSpeaking) {
          this.interruptAgent();
        }

        // Process user input
        this.handleUserInput(msg.text);
      }
    };
  }

  initializeTTS() {
    this.tts.onopen = () => {
      this.tts.send(
        JSON.stringify({
          type: "init",
          config: { encoding: "linear16", sample_rate: 24000 },
        }),
      );
    };

    this.tts.onmessage = (event) => {
      if (event.data instanceof ArrayBuffer) {
        this.playAudio(event.data);
      } else {
        const msg = JSON.parse(event.data);
        if (msg.type === "flushed") {
          this.isAgentSpeaking = false;
        }
      }
    };
  }

  interruptAgent() {
    // 1. Cancel server-side TTS
    this.tts.send(JSON.stringify({ type: "cancel" }));

    // 2. Clear local audio queue
    this.audioQueue = [];

    // 3. Stop current playback
    if (this.currentAudioSource) {
      this.currentAudioSource.stop();
      this.currentAudioSource = null;
    }

    this.isAgentSpeaking = false;
  }

  async handleUserInput(userText) {
    console.log("User said:", userText);

    // Get response from your LLM
    const response = await this.generateResponse(userText);

    // Speak response
    this.speak(response);
  }

  speak(text) {
    this.isAgentSpeaking = true;

    this.tts.send(
      JSON.stringify({
        type: "speak",
        text: text,
      }),
    );

    this.tts.send(
      JSON.stringify({
        type: "flush",
      }),
    );
  }

  async generateResponse(userText) {
    // Your LLM call here
    return "I understand you said: " + userText;
  }

  playAudio(audioData) {
    // Your audio playback implementation
  }
}

// Usage
const agent = new InterruptibleVoiceAgent();

See complete voice agent examples →

Best Practices

1. Connection Management

Implement Reconnection Logic:


Code
 
class ResilientWebSocket {
  constructor(url) {
    this.url = url;
    this.backoff = 1000;
    this.maxBackoff = 30000;
    this.connect();
  }

  connect() {
    this.ws = new WebSocket(this.url);

    this.ws.onclose = () => {
      console.log(`Reconnecting in ${this.backoff}ms`);
      setTimeout(() => this.connect(), this.backoff);
      this.backoff = Math.min(this.backoff * 2, this.maxBackoff);
    };

    this.ws.onopen = () => {
      console.log("Connected");
      this.backoff = 1000; // Reset backoff on successful connection
    };
  }
}

2. Error Handling

Always handle errors gracefully:


Code
 
ws.onerror = (error) => {
  console.error("WebSocket error:", error);
  // Notify user, attempt recovery
};

ws.onmessage = (event) => {
  if (typeof event.data === "string") {
    const message = JSON.parse(event.data);
    if (message.type === "error") {
      handleServerError(message.code, message.message);
    }
  }
};

3. Buffer Management (TTS)

Use flush strategically to control latency vs. quality:


Code
 
// For low latency: flush frequently
ws.send(JSON.stringify({ type: "speak", text: sentence, flush: true }));

// For better quality: let buffering happen
ws.send(JSON.stringify({ type: "speak", text: paragraph }));
// ... send more text ...
ws.send(JSON.stringify({ type: "flush" })); // Flush at the end

4. Audio Format Consistency

Ensure your audio format matches configuration:


Code
 
// Configuration
{
  encoding: 'linear16',    // 16-bit PCM
  sample_rate: 16000       // 16kHz
}

// Audio capture must match:
// - 16-bit samples
// - 16000 Hz sample rate
// - Single channel (mono)

5. Heartbeat/Keep-Alive

Implement ping/pong to keep connection alive:


Code
 
let pingInterval = setInterval(() => {
  if (ws.readyState === WebSocket.OPEN) {
    ws.send(JSON.stringify({ type: "ping" }));
  }
}, 30000); // Every 30 seconds

ws.onclose = () => {
  clearInterval(pingInterval);
};

Troubleshooting

Connection Drops

Problem: WebSocket disconnects unexpectedly

Solutions:

Implement reconnection logic with exponential backoff
Send periodic heartbeat/ping messages
Check firewall/proxy settings
Verify network stability

Audio Glitches (TTS)

Problem: Choppy or distorted audio playback

Solutions:

Implement proper audio buffering
Use WebAudio API for smooth playback
Check sample rate matches configuration
Ensure sufficient bandwidth

Delayed Transcriptions (STT)

Problem: Transcription results lag behind audio

Solutions:

Reduce audio chunk size for faster processing
Check network latency
Verify audio format matches configuration
Use appropriate model for real-time (Deepgram Nova)

Authentication Failures

Problem: Connection rejected with 401

Solutions:

Include Authorization header in connection request
Verify API key is valid and active
Check key hasn't expired
Ensure proper header format: Authorization: Bearer YOUR_KEY

Performance Tips

Reuse Connections: Keep WebSocket open for multiple operations
Batch Text: Send longer text chunks for better efficiency (TTS)
Buffer Audio: Stream audio in appropriate chunk sizes (STT)
Monitor Latency: Track end-to-end latency and optimize
Local Processing: Handle audio encoding/decoding client-side

Next Steps

Last modified on December 30, 2025

HTTP vs. SSE vs. WebSocket Voice Agents