Skip to main content
You need a working knowledge of the WebSocket protocol. These examples use the Deepgram Aura model; swap the endpoint path to use a different provider. WebSockets let you stream audio with sub-100ms latency and stop mid-sentence, which is what you need for voice agent conversations. If you only need to generate audio files, HTTP is simpler.

Placeholders

The snippets below use these placeholders. Replace them before running the code.
PlaceholderReplace with
SLNG_API_KEYAn SLNG key from app.slng.ai/api-keys. The snippets read it from the SLNG_API_KEY environment variable.

Message Flow

Every TTS WebSocket session follows this pattern: For the full list of message types and parameters, see the WebSocket protocol reference. To keep acronyms, names, and domain terms consistent during synthesis, set a session default or per-turn pronunciation dictionary.

Quick Start

Connect, initialize a session, send text, and save the audio to a file. Both examples write a output.pcm file containing raw 16-bit PCM audio at 24 kHz. You can play it with ffplay:
ffplay -f s16le -ar 24000 -ac 1 output.pcm
// npm install ws
const WebSocket = require("ws");
const fs = require("fs");

const API_KEY = process.env.SLNG_API_KEY;
const ws = new WebSocket("wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en", {
  headers: { Authorization: `Bearer ${API_KEY}` },
});

const audioChunks = [];

ws.on("open", () => {
  // 1. Initialize session
  ws.send(
    JSON.stringify({
      type: "init",
      model: "aura:2-en",
      voice: "aura-2-thalia-en",
      config: {
        encoding: "linear16",
        sample_rate: 24000,
      },
    }),
  );

  // 2. Send text to convert
  ws.send(
    JSON.stringify({
      type: "text",
      text: "Hello! This is a WebSocket TTS example.",
    }),
  );

  // 3. Flush to get remaining audio
  ws.send(JSON.stringify({ type: "flush" }));
});

ws.on("message", (data, isBinary) => {
  if (isBinary) {
    audioChunks.push(data);
  } else {
    const message = JSON.parse(data.toString());
    if (message.type === "ready") {
      console.log("Session ready:", message.session_id);
    } else if (message.type === "flushed") {
      console.log("All audio received, saving to output.pcm");
      fs.writeFileSync("output.pcm", Buffer.concat(audioChunks));
      ws.close();
    } else if (message.type === "error") {
      console.error("Error:", message.message);
      ws.close();
    }
  }
});

More Examples

Batch Text Streaming

Send multiple sentences for smoother speech instead of one large block. This is a complete example you can run independently.
// npm install ws
const WebSocket = require("ws");
const fs = require("fs");

const API_KEY = process.env.SLNG_API_KEY;
const ws = new WebSocket("wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en", {
  headers: { Authorization: `Bearer ${API_KEY}` },
});

const text =
  "The sun rose over the mountains. Birds began to sing in the trees. A gentle breeze carried the scent of wildflowers across the valley.";
const audioChunks = [];

ws.on("open", () => {
  ws.send(
    JSON.stringify({
      type: "init",
      model: "aura:2-en",
      voice: "aura-2-thalia-en",
      config: { encoding: "linear16", sample_rate: 24000 },
    }),
  );

  // Stream sentences one at a time
  const sentences = text.split(". ");
  for (const sentence of sentences) {
    ws.send(
      JSON.stringify({
        type: "text",
        text: sentence.endsWith(".") ? sentence : sentence + ".",
      }),
    );
  }

  ws.send(JSON.stringify({ type: "flush" }));
});

ws.on("message", (data, isBinary) => {
  if (isBinary) {
    audioChunks.push(data);
  } else {
    const message = JSON.parse(data.toString());
    if (message.type === "flushed") {
      console.log("All audio received, saving to output.pcm");
      fs.writeFileSync("output.pcm", Buffer.concat(audioChunks));
      ws.close();
    }
  }
});

Next Steps

Live TTS demo

Try real-time TTS in your browser, no setup needed

WebSocket protocol

Full message types, parameters, and error codes

Pronunciation dictionaries

Reuse pronunciation rules across TTS requests

TTS HTTP examples

Simpler integration for non-streaming use cases

TTS API reference

Endpoint-specific parameters