Skip to main content
You need an SLNG API key and a working knowledge of the WebSocket protocol. These examples use the Deepgram Aura model; swap the endpoint path to use a different provider. WebSockets let you stream audio with sub-100ms latency and cancel mid-sentence, which is what you need for voice agent conversations. If you only need to generate audio files, HTTP is simpler.

Message Flow

Every TTS WebSocket session follows this pattern: For the full list of message types and parameters, see the WebSocket protocol reference.

Quick Start

Connect, initialize a session, send text, and receive audio:
const ws = new WebSocket("wss://api.slng.ai/v1/tts/slng/deepgram/aura:2");

ws.onopen = () => {
  // 1. Initialize session
  ws.send(
    JSON.stringify({
      type: "init",
      config: {
        encoding: "linear16",
        sample_rate: 24000,
      },
    }),
  );

  // 2. Send text to convert
  ws.send(
    JSON.stringify({
      type: "text",
      text: "Hello! This is a WebSocket TTS example.",
    }),
  );

  // 3. Flush to get remaining audio
  ws.send(
    JSON.stringify({
      type: "flush",
    }),
  );
};

ws.onmessage = (event) => {
  if (event.data instanceof ArrayBuffer) {
    // Binary audio data - play it!
    playAudio(event.data);
  } else {
    // JSON control messages
    const message = JSON.parse(event.data);
    if (message.type === "ready") {
      console.log("Session ready:", message.session_id);
    } else if (message.type === "flushed") {
      console.log("All audio sent");
    }
  }
};

More Examples

Batch Text Streaming

Send multiple sentences for smoother speech instead of one large block:
const sentences = text.split(". ");

for (const sentence of sentences) {
  ws.send(
    JSON.stringify({
      type: "text",
      text: sentence.endsWith(".") ? sentence : sentence + ".",
    }),
  );
}

// Flush after all sentences are queued
ws.send(JSON.stringify({ type: "flush" }));

Next Steps