> ## Documentation Index > Fetch the complete documentation index at: https://docs.slng.ai/llms.txt > Use this file to discover all available pages before exploring further. # Speech-to-text WebSocket examples > Code samples in Python and Node.js for basic transcription, word timestamps, and diarization. You need a working knowledge of the [WebSocket protocol](/websockets). These examples use the Deepgram Nova model; see the [Speech-to-Text models](/models/stt) for other models and endpoints. WebSockets let you transcribe in real-time as users speak and receive interim results for immediate feedback. If you only need to transcribe pre-recorded files, [HTTP is simpler](/examples/stt-http). ## Placeholders The snippets below use these placeholders. Replace them before running the code. | Placeholder | Replace with | | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys). The snippets read it from the `SLNG_API_KEY` environment variable. | | `recording.wav` | A local WAV or raw PCM audio file to transcribe | ## Message Flow Every STT WebSocket session follows this pattern: ```mermaid theme={null} sequenceDiagram participant Client participant SLNG Client->>SLNG: Connect wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en SLNG-->>Client: Connection open Client->>SLNG: { type: "init", config } SLNG-->>Client: { type: "ready", session_id: "..." } Client->>SLNG: binary audio data SLNG-->>Client: { type: "partial_transcript", transcript: "..." } Client->>SLNG: binary audio data SLNG-->>Client: { type: "final_transcript", transcript: "..." } Client->>SLNG: { type: "finalize" } Client->>SLNG: { type: "close" } ``` For the full list of message types and parameters, see the [WebSocket protocol reference](/websockets). *** ## Quick Start Connect, initialize a session, stream an audio file, and print the transcription. You need a WAV or raw PCM file to test with. Any short speech recording works. ```javascript JavaScript theme={null} // npm install ws const WebSocket = require("ws"); const fs = require("fs"); const API_KEY = process.env.SLNG_API_KEY; const AUDIO_FILE = process.argv[2] || "input.wav"; const ws = new WebSocket("wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en", { headers: { Authorization: `Bearer ${API_KEY}` }, }); ws.on("open", () => { // 1. Initialize session ws.send( JSON.stringify({ type: "init", config: { language: "en", sample_rate: 16000, encoding: "linear16", }, }), ); }); ws.on("message", (data) => { const message = JSON.parse(data.toString()); if (message.type === "ready") { console.log("Session ready:", message.session_id); // 2. Read and stream audio file in chunks const audio = fs.readFileSync(AUDIO_FILE); const CHUNK_SIZE = 4096; for (let i = 0; i < audio.length; i += CHUNK_SIZE) { ws.send(audio.slice(i, i + CHUNK_SIZE)); } // 3. Signal end of audio ws.send(JSON.stringify({ type: "close" })); } else if (message.type === "partial_transcript") { console.log("Interim:", message.transcript); } else if (message.type === "final_transcript") { console.log("Final:", message.transcript); } else if (message.type === "error") { console.error("Error:", message.message); ws.close(); } }); ws.on("close", () => { console.log("Connection closed"); }); ``` ```python Python theme={null} # pip install websockets import asyncio import json import os import sys import websockets CHUNK_SIZE = 4096 async def stt_quickstart(): api_key = os.environ["SLNG_API_KEY"] audio_file = sys.argv[1] if len(sys.argv) > 1 else "input.wav" uri = "wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en" headers = {"Authorization": f"Bearer {api_key}"} async with websockets.connect(uri, extra_headers=headers) as ws: # 1. Initialize session await ws.send(json.dumps({ "type": "init", "config": { "language": "en", "sample_rate": 16000, "encoding": "linear16", }, })) # Wait for ready before streaming audio ready = json.loads(await ws.recv()) print(f"Session ready: {ready['session_id']}") # 2. Read and stream audio file in chunks with open(audio_file, "rb") as f: while chunk := f.read(CHUNK_SIZE): await ws.send(chunk) # 3. Signal end of audio await ws.send(json.dumps({"type": "close"})) # 4. Receive transcription results async for message in ws: data = json.loads(message) if data["type"] == "partial_transcript": print(f"Interim: {data['transcript']}") elif data["type"] == "final_transcript": print(f"Final: {data['transcript']}") elif data["type"] == "error": print(f"Error: {data['message']}") break asyncio.run(stt_quickstart()) ``` Run with: ```bash JavaScript theme={null} node stt.js recording.wav ``` ```bash Python theme={null} python stt.py recording.wav ``` *** ## Going further The WebSocket protocol supports several options you can set in the `init` config or take advantage of in the response: * **Interim vs final transcripts**: Partial transcripts update in real-time as the user speaks. Final transcripts are confirmed segments that won't change. Use partials for live captions and finals for processing. * **Language**: Pass a `language` code in the init config for better accuracy. Not all models auto-detect. * **Endpointing**: Controls how quickly the API finalizes a transcript after silence. Useful for voice agents where you want fast turn-taking. * **Close vs finalize**: Send `{ "type": "close" }` when you are done to end the session. Use `{ "type": "finalize" }` to flush results mid-session without disconnecting. * **Keep-alive**: For long-running sessions with periods of silence, send `{ "type": "keepalive" }` periodically to prevent idle disconnection. For the full parameter list per model, see the [Speech-to-Text API reference](/api-reference/stt/deepgram-nova-3/nova-3-ws). *** ## Next Steps Try real-time speech recognition in your browser, no setup needed Simpler integration for pre-recorded files Full message types, parameters, and error codes Endpoint-specific parameters