> ## Documentation Index
> Fetch the complete documentation index at: https://docs.slng.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech-to-text WebSocket examples

> Code samples in Python and Node.js for basic transcription, word timestamps, and diarization.

You need a working knowledge of the [WebSocket protocol](/websockets). These examples use the Deepgram Nova model; see the [Speech-to-Text models](/models/stt) for other models and endpoints.

WebSockets let you transcribe in real-time as users speak and receive interim results for immediate feedback. If you only need to transcribe pre-recorded files, [HTTP is simpler](/examples/stt-http).

## Placeholders

The snippets below use these placeholders. Replace them before running the code.

| Placeholder     | Replace with                                                                                                                              |
| --------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `SLNG_API_KEY`  | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys). The snippets read it from the `SLNG_API_KEY` environment variable. |
| `recording.wav` | A local WAV or raw PCM audio file to transcribe                                                                                           |

## Message Flow

Every STT WebSocket session follows this pattern:

```mermaid theme={null}
sequenceDiagram
    participant Client
    participant SLNG
    Client->>SLNG: Connect wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en
    SLNG-->>Client: Connection open
    Client->>SLNG: { type: "init", config }
    SLNG-->>Client: { type: "ready", session_id: "..." }
    Client->>SLNG: binary audio data
    SLNG-->>Client: { type: "partial_transcript", transcript: "..." }
    Client->>SLNG: binary audio data
    SLNG-->>Client: { type: "final_transcript", transcript: "..." }
    Client->>SLNG: { type: "finalize" }
    Client->>SLNG: { type: "close" }
```

For the full list of message types and parameters, see the [WebSocket protocol reference](/websockets).

***

## Quick Start

Connect, initialize a session, stream an audio file, and print the transcription. You need a WAV or raw PCM file to test with. Any short speech recording works.

<CodeGroup>
  ```javascript JavaScript theme={null}
  // npm install ws
  const WebSocket = require("ws");
  const fs = require("fs");

  const API_KEY = process.env.SLNG_API_KEY;
  const AUDIO_FILE = process.argv[2] || "input.wav";

  const ws = new WebSocket("wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en", {
    headers: { Authorization: `Bearer ${API_KEY}` },
  });

  ws.on("open", () => {
    // 1. Initialize session
    ws.send(
      JSON.stringify({
        type: "init",
        config: {
          language: "en",
          sample_rate: 16000,
          encoding: "linear16",
        },
      }),
    );
  });

  ws.on("message", (data) => {
    const message = JSON.parse(data.toString());

    if (message.type === "ready") {
      console.log("Session ready:", message.session_id);

      // 2. Read and stream audio file in chunks
      const audio = fs.readFileSync(AUDIO_FILE);
      const CHUNK_SIZE = 4096;
      for (let i = 0; i < audio.length; i += CHUNK_SIZE) {
        ws.send(audio.slice(i, i + CHUNK_SIZE));
      }

      // 3. Signal end of audio
      ws.send(JSON.stringify({ type: "close" }));
    } else if (message.type === "partial_transcript") {
      console.log("Interim:", message.transcript);
    } else if (message.type === "final_transcript") {
      console.log("Final:", message.transcript);
    } else if (message.type === "error") {
      console.error("Error:", message.message);
      ws.close();
    }
  });

  ws.on("close", () => {
    console.log("Connection closed");
  });
  ```

  ```python Python theme={null}
  # pip install websockets
  import asyncio
  import json
  import os
  import sys
  import websockets

  CHUNK_SIZE = 4096

  async def stt_quickstart():
      api_key = os.environ["SLNG_API_KEY"]
      audio_file = sys.argv[1] if len(sys.argv) > 1 else "input.wav"
      uri = "wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en"
      headers = {"Authorization": f"Bearer {api_key}"}

      async with websockets.connect(uri, extra_headers=headers) as ws:
          # 1. Initialize session
          await ws.send(json.dumps({
              "type": "init",
              "config": {
                  "language": "en",
                  "sample_rate": 16000,
                  "encoding": "linear16",
              },
          }))

          # Wait for ready before streaming audio
          ready = json.loads(await ws.recv())
          print(f"Session ready: {ready['session_id']}")

          # 2. Read and stream audio file in chunks
          with open(audio_file, "rb") as f:
              while chunk := f.read(CHUNK_SIZE):
                  await ws.send(chunk)

          # 3. Signal end of audio
          await ws.send(json.dumps({"type": "close"}))

          # 4. Receive transcription results
          async for message in ws:
              data = json.loads(message)
              if data["type"] == "partial_transcript":
                  print(f"Interim: {data['transcript']}")
              elif data["type"] == "final_transcript":
                  print(f"Final: {data['transcript']}")
              elif data["type"] == "error":
                  print(f"Error: {data['message']}")
                  break

  asyncio.run(stt_quickstart())
  ```
</CodeGroup>

Run with:

<CodeGroup>
  ```bash JavaScript theme={null}
  node stt.js recording.wav
  ```

  ```bash Python theme={null}
  python stt.py recording.wav
  ```
</CodeGroup>

***

## Going further

The WebSocket protocol supports several options you can set in the `init` config or take advantage of in the response:

* **Interim vs final transcripts**: Partial transcripts update in real-time as the user speaks. Final transcripts are confirmed segments that won't change. Use partials for live captions and finals for processing.
* **Language**: Pass a `language` code in the init config for better accuracy. Not all models auto-detect.
* **Endpointing**: Controls how quickly the API finalizes a transcript after silence. Useful for voice agents where you want fast turn-taking.
* **Close vs finalize**: Send `{ "type": "close" }` when you are done to end the session. Use `{ "type": "finalize" }` to flush results mid-session without disconnecting.
* **Keep-alive**: For long-running sessions with periods of silence, send `{ "type": "keepalive" }` periodically to prevent idle disconnection.

For the full parameter list per model, see the [Speech-to-Text API reference](/api-reference/stt/deepgram-nova-3/nova-3-ws).

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Live STT demo" icon="play" href="https://slng-stt-demo.onrender.com/">
    Try real-time speech recognition in your browser, no setup needed
  </Card>

  <Card title="STT HTTP examples" icon="code-xml" href="/examples/stt-http">
    Simpler integration for pre-recorded files
  </Card>

  <Card title="WebSocket protocol" icon="plug" href="/websockets">
    Full message types, parameters, and error codes
  </Card>

  <Card title="STT API reference" icon="ear" href="/api-reference/stt/deepgram-nova-3/nova-3-ws">
    Endpoint-specific parameters
  </Card>
</CardGroup>
