Skip to main content
You need an SLNG API key and a working knowledge of the WebSocket protocol. These examples use the Deepgram Nova model; see Choosing a Model for other available models and endpoints. WebSockets let you transcribe in real-time as users speak and receive interim results for immediate feedback. If you only need to transcribe pre-recorded files, HTTP is simpler. Set your API key as an environment variable before running the examples:
export SLNG_API_KEY=your_api_key_here

Message Flow

Every STT WebSocket session follows this pattern: For the full list of message types and parameters, see the WebSocket protocol reference.

Quick Start

Connect, initialize a session, stream an audio file, and print the transcription. You need a WAV or raw PCM file to test with — any short speech recording works.
// npm install ws
const WebSocket = require("ws");
const fs = require("fs");

const API_KEY = process.env.SLNG_API_KEY;
const AUDIO_FILE = process.argv[2] || "input.wav";

const ws = new WebSocket("wss://api.slng.ai/v1/stt/slng/deepgram/nova:3", {
  headers: { Authorization: `Bearer ${API_KEY}` },
});

ws.on("open", () => {
  // 1. Initialize session
  ws.send(
    JSON.stringify({
      type: "init",
      config: {
        language: "en",
        sample_rate: 16000,
        encoding: "linear16",
      },
    }),
  );
});

ws.on("message", (data) => {
  const message = JSON.parse(data.toString());

  if (message.type === "ready") {
    console.log("Session ready:", message.session_id);

    // 2. Read and stream audio file in chunks
    const audio = fs.readFileSync(AUDIO_FILE);
    const CHUNK_SIZE = 4096;
    for (let i = 0; i < audio.length; i += CHUNK_SIZE) {
      ws.send(audio.slice(i, i + CHUNK_SIZE));
    }

    // 3. Signal end of audio
    ws.send(JSON.stringify({ type: "close" }));
  } else if (message.type === "partial_transcript") {
    console.log("Interim:", message.transcript);
  } else if (message.type === "final_transcript") {
    console.log("Final:", message.transcript);
  } else if (message.type === "error") {
    console.error("Error:", message.message);
    ws.close();
  }
});

ws.on("close", () => {
  console.log("Connection closed");
});
Run with:
node stt.js recording.wav

Going further

The WebSocket STT API supports several options you can set in the init config or take advantage of in the response:
  • Interim vs final transcripts — Partial transcripts update in real-time as the user speaks. Final transcripts are confirmed segments that won’t change. Use partials for live captions and finals for processing.
  • Language — Pass a language code in the init config for better accuracy. Not all models auto-detect.
  • Endpointing — Controls how quickly the API finalizes a transcript after silence. Useful for voice agents where you want fast turn-taking.
  • Close vs finalize — Send { "type": "close" } when you are done to end the session. Use { "type": "finalize" } to flush results mid-session without disconnecting.
  • Keep-alive — For long-running sessions with periods of silence, send { "type": "keepalive" } periodically to prevent idle disconnection.
For the full parameter list per model, see the Speech-to-Text API reference.

Next Steps

Live STT demo

Try real-time speech recognition in your browser, no setup needed

STT HTTP examples

Simpler integration for pre-recorded files

WebSocket protocol

Full message types, parameters, and error codes

STT API reference

Endpoint-specific parameters