> ## Documentation Index
> Fetch the complete documentation index at: https://docs.slng.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# WebSocket API reference

> Stream text-to-speech and speech-to-text in real time over WebSocket. Message format, init handshake, sub-100ms streaming, and reconnection patterns.

WebSocket connections give you sub-100ms latency and bidirectional streaming, ideal for voice agents, live transcription, and any use case where audio flows continuously. If you don't need real-time streaming, see [HTTP vs. WebSocket](/protocols) instead.

**Prerequisites:**

* An [SLNG key](/getting-started)
* Basic familiarity with WebSockets (open, send, receive, close)

## Protocol overview

This page documents the WebSocket protocol used by SLNG-hosted models and [bridges](/execution-layer/unified-api). Third-party providers (Deepgram, KugelAudio, Sarvam) may expose their own native WebSocket formats. Check the per-model API reference in the sidebar for provider-specific details.

<Note>Supported encodings, sample rates, and optional fields vary by model. For model-specific parameters, see the Text-to-Speech and Speech-to-Text tabs in the sidebar.</Note>

***

## TTS WebSocket Protocol

For the full per-model parameter list and response schema, see the model's page in the **Text-to-Speech** sidebar.

### Connection

```
WSS wss://api.slng.ai/v1/tts/{provider}/{model}:{variant}
```

Use the direct provider path for proxied models and add the `slng/` prefix for
SLNG-hosted models:

| Hosting                | URL pattern                                                  | Example                                            |
| ---------------------- | ------------------------------------------------------------ | -------------------------------------------------- |
| Proxied provider model | `wss://api.slng.ai/v1/tts/{provider}/{model}:{variant}`      | `wss://api.slng.ai/v1/tts/deepgram/aura:2`         |
| SLNG-hosted model      | `wss://api.slng.ai/v1/tts/slng/{provider}/{model}:{variant}` | `wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en` |

### Message Flow

```mermaid theme={null}
sequenceDiagram
    participant Client
    participant SLNG
    Client->>SLNG: Connect
    SLNG-->>Client: Connection open
    Client->>SLNG: { type: "init", model, voice, config }
    SLNG-->>Client: { type: "ready", session_id }
    Client->>SLNG: { type: "text", text: "..." }
    SLNG-->>Client: { type: "audio_chunk", data }
    SLNG-->>Client: { type: "audio_chunk", data }
    Client->>SLNG: { type: "flush" }
    SLNG-->>Client: { type: "flushed" }
    Client->>SLNG: { type: "text", text: "..." }
    SLNG-->>Client: { type: "audio_chunk", data }
    Client->>SLNG: { type: "clear" }
    SLNG-->>Client: { type: "cleared" }
    SLNG-->>Client: { type: "audio_end" }
    Client->>SLNG: { type: "close" }
```

Browse all available TTS models and endpoints on the [Text-to-Speech models](/models/tts) page.

### Client → Server Messages

#### Initialize Session

Initialize a session with model and voice configuration before sending text.

```json theme={null}
{
  "type": "init",
  "model": "aura:2",
  "voice": "aura-2-thalia-en",
  "config": {
    "sample_rate": 24000,
    "encoding": "linear16",  // 16-bit signed PCM — the most common raw audio format
    "language": "en",
    "speed": 1.0,
    "pronunciation": {
      "mode": "rewrite",
      "name": "brand-pronunciations"
    }
  }
}
```

Use `config.pronunciation` to set a default [pronunciation dictionary](/pronunciation-dictionaries) for the session.

**Parameters:**

| Field                  | Type                                | Required | Description                                    |
| ---------------------- | ----------------------------------- | -------- | ---------------------------------------------- |
| `type`                 | `"init"`                            | Yes      | Message type                                   |
| `model`                | `string`                            | Yes      | Model identifier                               |
| `voice`                | `string`                            | No       | Voice identifier                               |
| `config`               | `object`                            | No       | Session configuration                          |
| `config.sample_rate`   | `number`                            | No       | Audio sample rate in Hz                        |
| `config.encoding`      | `"linear16"` \| `"mp3"` \| `"opus"` | No       | Audio encoding format                          |
| `config.language`      | `string`                            | No       | Language code                                  |
| `config.speed`         | `number`                            | No       | Speech speed multiplier                        |
| `config.pronunciation` | `object`                            | No       | Default pronunciation dictionary for TTS turns |

#### Send Text for Synthesis

Send text to synthesize into audio. Set `flush: true` to finalize the current segment immediately instead of waiting for more text.

```json theme={null}
{
  "type": "text",
  "text": "Hello from SLNG.",
  "flush": false,
  "pronunciation": {
    "mode": "rewrite",
    "name": "support-pronunciations"
  }
}
```

Use `pronunciation` on a text message to override the active dictionary for that turn. Later turns reuse the most recent active dictionary.

| Field           | Type      | Required | Description                                     |
| --------------- | --------- | -------- | ----------------------------------------------- |
| `type`          | `"text"`  | Yes      | Message type                                    |
| `text`          | `string`  | Yes      | Text to synthesize                              |
| `flush`         | `boolean` | No       | Whether to flush remaining audio immediately    |
| `pronunciation` | `object`  | No       | Pronunciation dictionary override for this turn |

#### Flush Buffer

Force any buffered text/audio to be finalized and delivered.

```json theme={null}
{
  "type": "flush"
}
```

#### Clear Buffer

Clear any queued text/audio from the current session.

```json theme={null}
{
  "type": "clear"
}
```

#### Close Session

Close the current session and stop any further audio generation.

```json theme={null}
{
  "type": "close"
}
```

### Server → Client Messages

#### Session Ready

Indicates the session is ready to receive messages.

```json theme={null}
{
  "type": "ready",
  "session_id": "session_123"
}
```

#### Audio Chunk

Chunk of base64-encoded audio data.

```json theme={null}
{
  "type": "audio_chunk",
  "data": "SGVsbG8gV29ybGQ=",
  "sequence": 1
}
```

| Field      | Type            | Required | Description                  |
| ---------- | --------------- | -------- | ---------------------------- |
| `type`     | `"audio_chunk"` | Yes      | Message type                 |
| `data`     | `string`        | Yes      | Base64-encoded audio data    |
| `sequence` | `integer`       | No       | Sequence number for ordering |

<Tip>Audio may also arrive as raw binary WebSocket frames instead of base64 JSON. Binary frames have lower overhead.</Tip>

#### Segment Start

Signals the start of a synthesized segment.

```json theme={null}
{
  "type": "segment_start",
  "segment_id": "seg_1"
}
```

#### Segment End

Signals the end of a synthesized segment.

```json theme={null}
{
  "type": "segment_end",
  "segment_id": "seg_1"
}
```

#### Flushed

Acknowledges that buffered output was flushed.

```json theme={null}
{
  "type": "flushed"
}
```

#### Cleared

Acknowledges that queued output was cleared.

```json theme={null}
{
  "type": "cleared"
}
```

#### Audio End

Signals the end of audio generation.

```json theme={null}
{
  "type": "audio_end",
  "duration": 0.12
}
```

| Field      | Type          | Required | Description    |
| ---------- | ------------- | -------- | -------------- |
| `type`     | `"audio_end"` | Yes      | Message type   |
| `duration` | `number`      | No       | Audio duration |

#### Error

Indicates an error occurred during synthesis.

```json theme={null}
{
  "type": "error",
  "code": "provider_error",
  "message": "Upstream error"
}
```

| Field     | Type      | Required | Description                      |
| --------- | --------- | -------- | -------------------------------- |
| `type`    | `"error"` | Yes      | Message type                     |
| `code`    | `string`  | Yes      | Error code                       |
| `message` | `string`  | Yes      | Human-readable error description |

**Common Error Codes:**

* `auth_error`: Invalid or missing SLNG key
* `config_error`: Invalid configuration
* `rate_limit`: Too many requests
* `provider_error`: Upstream provider error

***

## STT WebSocket Protocol

For the full per-model parameter list and response schema, see the model's page in the **Speech-to-Text** sidebar.

### Connection

```
WSS wss://api.slng.ai/v1/stt/{provider}/{model}:{variant}
```

Use the direct provider path for proxied models and add the `slng/` prefix for
SLNG-hosted models:

| Hosting                | URL pattern                                                  | Example                                                 |
| ---------------------- | ------------------------------------------------------------ | ------------------------------------------------------- |
| Proxied provider model | `wss://api.slng.ai/v1/stt/{provider}/{model}:{variant}`      | `wss://api.slng.ai/v1/stt/deepgram/nova:3`              |
| SLNG-hosted model      | `wss://api.slng.ai/v1/stt/slng/{provider}/{model}:{variant}` | `wss://api.slng.ai/v1/stt/slng/openai/whisper:large-v3` |

### Message Flow

```mermaid theme={null}
sequenceDiagram
    participant Client
    participant SLNG
    Client->>SLNG: Connect
    SLNG-->>Client: Connection open
    Client->>SLNG: { type: "init", config }
    SLNG-->>Client: { type: "ready", session_id }
    Client->>SLNG: binary audio data
    SLNG-->>Client: { type: "partial_transcript" }
    Client->>SLNG: binary audio data
    SLNG-->>Client: { type: "final_transcript" }
    Note over Client: Silence — send keepalive
    Client->>SLNG: { type: "keepalive" }
    Client->>SLNG: binary audio data
    SLNG-->>Client: { type: "final_transcript" }
    Client->>SLNG: { type: "finalize" }
    SLNG-->>Client: { type: "final_transcript" }
    Client->>SLNG: { type: "close" }
```

Browse all available STT models and endpoints on the [Speech-to-Text models](/models/stt) page.

### Client → Server Messages

#### Initialize Session

Initialize a session with recognition configuration before streaming audio.

```json theme={null}
{
  "type": "init",
  "config": {
    "language": "en",
    "sample_rate": 16000,
    "encoding": "linear16",
    "enable_vad": true,
    "enable_diarization": false,
    "enable_word_timestamps": true,
    "enable_partials": true
  }
}
```

**Parameters:**

| Field                           | Type                                | Required | Description                        |
| ------------------------------- | ----------------------------------- | -------- | ---------------------------------- |
| `type`                          | `"init"`                            | Yes      | Message type                       |
| `config`                        | `object`                            | No       | Recognition configuration          |
| `config.language`               | `string`                            | No       | Language code for recognition      |
| `config.sample_rate`            | `number`                            | No       | Audio sample rate in Hz            |
| `config.encoding`               | `"linear16"` \| `"mp3"` \| `"opus"` | No       | Audio encoding format              |
| `config.enable_vad`             | `boolean`                           | No       | Enable voice activity detection    |
| `config.enable_diarization`     | `boolean`                           | No       | Enable speaker diarization         |
| `config.enable_word_timestamps` | `boolean`                           | No       | Include word-level timestamps      |
| `config.enable_partials`        | `boolean`                           | No       | Enable partial/interim transcripts |

#### Send Audio Data

Stream an audio frame to be transcribed. After initialization, send audio in one of two formats:

<Tip>Binary frames are recommended over base64-encoded JSON for lower overhead.</Tip>

**Binary frames**: send raw PCM audio samples directly as binary WebSocket frames:

```javascript theme={null}
ws.send(audioBuffer); // ArrayBuffer or Uint8Array
```

**JSON messages** with base64-encoded data:

```json theme={null}
{
  "type": "audio",
  "data": "SGVsbG8gV29ybGQ="
}
```

| Field  | Type      | Required | Description               |
| ------ | --------- | -------- | ------------------------- |
| `type` | `"audio"` | Yes      | Message type              |
| `data` | `string`  | Yes      | Base64-encoded audio data |

#### Finalize Transcription

Force the server to finalize any buffered audio and return results. The connection stays open so you can continue streaming.

```json theme={null}
{
  "type": "finalize"
}
```

#### Close Stream

Signal that no more audio will be sent. The server processes any remaining audio, sends final results, then closes the connection.

```json theme={null}
{
  "type": "close"
}
```

<Tip>Use `finalize` when you want to flush results mid-session (e.g., between utterances). Use `close` when you are done and want to end the session.</Tip>

#### Keep-Alive

Send periodically during silence to prevent idle disconnection.

```json theme={null}
{
  "type": "keepalive"
}
```

### Server → Client Messages

#### Session Ready

Indicates the session is ready to receive audio.

```json theme={null}
{
  "type": "ready",
  "session_id": "session_123"
}
```

#### Partial Transcript

Interim transcription result.

```json theme={null}
{
  "type": "partial_transcript",
  "transcript": "Hello",
  "confidence": 0.91
}
```

| Field        | Type                   | Required | Description             |
| ------------ | ---------------------- | -------- | ----------------------- |
| `type`       | `"partial_transcript"` | Yes      | Message type            |
| `transcript` | `string`               | Yes      | Transcribed text so far |
| `confidence` | `number`               | No       | Confidence score (0-1)  |

#### Final Transcript

Final transcription result with optional metadata.

```json theme={null}
{
  "type": "final_transcript",
  "transcript": "Hello world",
  "confidence": 0.97,
  "language": "en",
  "duration": 2.5
}
```

| Field        | Type                 | Required | Description                         |
| ------------ | -------------------- | -------- | ----------------------------------- |
| `type`       | `"final_transcript"` | Yes      | Message type                        |
| `transcript` | `string`             | Yes      | Complete transcribed text           |
| `confidence` | `number`             | No       | Overall confidence score (0-1)      |
| `language`   | `string`             | No       | Detected or specified language code |
| `duration`   | `number`             | No       | Audio duration                      |

#### Error

Indicates an error occurred during recognition.

```json theme={null}
{
  "type": "error",
  "code": "provider_error",
  "message": "Upstream error"
}
```

| Field     | Type      | Required | Description                      |
| --------- | --------- | -------- | -------------------------------- |
| `type`    | `"error"` | Yes      | Message type                     |
| `code`    | `string`  | Yes      | Error code                       |
| `message` | `string`  | Yes      | Human-readable error description |

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Integration guide" icon="wrench" href="/websocket-guide">
    Best practices and troubleshooting
  </Card>

  <Card title="TTS examples" icon="audio-lines" href="/examples/tts-websocket">
    JavaScript and Python code for real-time TTS
  </Card>

  <Card title="STT examples" icon="ear" href="/examples/stt-websocket">
    JavaScript and Python code for real-time STT
  </Card>

  <Card title="Protocol comparison" icon="arrow-left-right" href="/protocols">
    HTTP vs. WebSocket: when to use each
  </Card>
</CardGroup>