Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.slng.ai/llms.txt

Use this file to discover all available pages before exploring further.

livekit-plugins-slng adds STT and TTS adapters for LiveKit Agents. It lets you use any model on the SLNG gateway from within a LiveKit agent.

Prerequisites

Installation

pip install livekit-plugins-slng

Credentials

You need an SLNG API key. The plugin reads it from the SLNG_API_KEY environment variable automatically:
export SLNG_API_KEY="your-slng-api-key"
You can also pass it explicitly via api_key:
stt = slng.STT(api_key="your-slng-api-key", model="deepgram/nova:3")
slng.STT also accepts a legacy api_token= alias, but it is deprecated — use api_key in new code.

Quickstart

Create an STT and TTS instance, then pass them to your LiveKit agent session:
from livekit.plugins import slng

stt = slng.STT(
    api_key="your-slng-api-key",
    model="deepgram/nova:3",
    region_override="eu-west-1",
    language="en",
)

tts = slng.TTS(
    api_key="your-slng-api-key",
    model="deepgram/aura:2",
    region_override=["eu-west-1", "us-east-1"],
    voice="aura-2-thalia-en",
    language="en",
)

Region override

The plugin supports gateway region routing through the region_override option on both STT and TTS. This maps directly to the gateway’s X-Region-Override header. You can pass either a single region:
stt = slng.STT(
    api_key="your-slng-api-key",
    model="deepgram/nova:3",
    region_override="eu-west-1",
)
Or multiple preferred regions in priority order:
tts = slng.TTS(
    api_key="your-slng-api-key",
    model="deepgram/aura:2",
    voice="aura-2-thalia-en",
    region_override=["eu-west-1", "us-east-1"],
)
See the full region list and override behavior at docs.slng.ai/region-override.

Full voice agent example

This example wires up STT, TTS, and VAD into a complete LiveKit agent that greets the user on join:
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import silero, slng

SLNG_API_KEY = "your-slng-api-key"


class MyAgent(Agent):
    async def on_enter(self):
        await self.session.say("Hello! How can I help?")


async def entrypoint(ctx: JobContext):
    await ctx.connect()

    stt = slng.STT(
        api_key=SLNG_API_KEY,
        model="deepgram/nova:3",
        language="en",
        sample_rate=16000,
        enable_partial_transcripts=True,
    )

    tts = slng.TTS(
        api_key=SLNG_API_KEY,
        model="deepgram/aura:2",
        voice="aura-2-thalia-en",
        language="en",
        sample_rate=24000,
    )

    # `turn_detection` and `allow_interruptions` still work in livekit-agents 1.5.x
    # but will be removed in v2.0. Use `turn_handling=TurnHandlingOptions(...)` going forward.
    session = AgentSession(
        stt=stt,
        tts=tts,
        vad=silero.VAD.load(),
        turn_detection="vad",
        allow_interruptions=True,
    )

    await session.start(agent=MyAgent(), room=ctx.room)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Model identifiers

Models follow the format provider/model:variant. Prefix with slng/ to target an SLNG-hosted instance:
provider/model:variant          # third-party passthrough
slng/provider/model:variant     # SLNG-hosted
Examples:
model="deepgram/nova:3"              # Deepgram Nova 3 (passthrough)
model="slng/deepgram/nova:3-en"      # SLNG-hosted Deepgram Nova 3, English
model="elevenlabs/eleven-flash:2.5"  # ElevenLabs Flash v2.5 (passthrough)
See the Models page for the full list of available models.

STT reference

slng.STT streams speech-to-text over WebSocket. It supports multi-endpoint failover.

Constructor

stt = slng.STT(
    api_key="your-slng-api-key",          # Required. SLNG API key. Falls back to SLNG_API_KEY env var.
    model="deepgram/nova:3",              # Model identifier. Default: "deepgram/nova:3"
    language="en",                         # Language code. Default: "en"
    sample_rate=16000,                     # Audio sample rate in Hz. Default: 16000
    encoding="pcm_s16le",                  # Audio encoding: "pcm_s16le" or "pcm_mulaw". Default: "pcm_s16le"
    buffer_size_seconds=0.064,             # Audio buffer size in seconds. Default: 0.064
    enable_partial_transcripts=True,       # Enable interim results. Default: True
    enable_diarization=False,              # Enable speaker identification. Default: False
    min_speakers=None,                     # Minimum speakers for diarization. Default: None
    max_speakers=None,                     # Maximum speakers for diarization. Default: None
    vad_threshold=0.5,                     # Voice activity detection threshold. Default: 0.5
    vad_min_silence_duration_ms=300,       # Minimum silence for VAD (ms). Default: 300
    vad_speech_pad_ms=30,                  # Speech padding for VAD (ms). Default: 30
    model_endpoint=None,                   # Optional explicit WebSocket endpoint URL
    model_endpoints=None,                  # Optional list of failover endpoints
    slng_base_url="api.slng.ai",           # Gateway host override (self-hosted or staging)
    region_override=None,                  # Optional region override, sent as X-Region-Override
    http_session=None,                     # Optional reused aiohttp.ClientSession
    # **model_options                      # Arbitrary model-specific kwargs (see below)
)
Any additional keyword arguments are forwarded as model-specific options — for example, whisper_params={"task": "translate"} for Whisper, or target_language_code="hi-IN" to override language normalization for Sarvam STT models.

Endpoint failover

Pass a list of endpoints to model_endpoints. If the first one fails, the plugin tries the next in order:
stt = slng.STT(
    api_key=SLNG_API_KEY,
    model_endpoints=[
        "wss://api.slng.ai/v1/stt/deepgram/nova:3",
        "wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en",
    ],
    language="en",
)

Default endpoint

If model_endpoint is omitted, the plugin connects to:
wss://api.slng.ai/v1/stt/{model}

Region override

To force routing toward one or more preferred gateway regions, pass region_override. This value is forwarded directly as the gateway X-Region-Override header.
stt = slng.STT(
    api_key=SLNG_API_KEY,
    model="deepgram/nova:3",
    region_override=["eu-west-1", "us-east-1"],
)
See the full region list and override behavior at docs.slng.ai/region-override.

TTS reference

slng.TTS streams text-to-speech over WebSocket with connection pooling.

Constructor

tts = slng.TTS(
    api_key="your-slng-api-key",          # Required. SLNG API key. Falls back to SLNG_API_KEY env var.
    model="deepgram/aura:2",              # Model identifier. Default: "deepgram/aura:2"
    voice="aura-2-thalia-en",             # Voice identifier. Default: "default"
    language="en",                         # Language code. Default: "en"
    sample_rate=24000,                     # Audio sample rate in Hz. Default: 24000
    speed=1.0,                             # Speech speed multiplier. Default: 1.0
    model_endpoint=None,                   # Optional explicit WebSocket endpoint URL
    slng_base_url="api.slng.ai",           # Gateway host override (self-hosted or staging)
    region_override=None,                  # Optional region override, sent as X-Region-Override
    word_tokenizer=None,                   # Optional custom tokenize.WordTokenizer
    http_session=None,                     # Optional reused aiohttp.ClientSession
    # **model_options                      # Arbitrary model-specific kwargs (see below)
)
Additional keyword arguments are forwarded to the chosen model’s init payload. Known keys by provider:
  • Rime Arcana: modelId, segment, speakingStyle, addBreathing, addDisfluencies, phonemizeBetweenBrackets, translateTo.
  • Sarvam Bulbul: pace, temperature, output_audio_bitrate, min_buffer_size, max_chunk_length, target_language_code.

Streaming vs batch

  • tts.stream() sends text word-by-word and returns audio chunks in real time. Use this for voice agents.
  • tts.synthesize(text) does one-shot synthesis. Works fine for previews, but stream() is better for interactive agents.

Default endpoint

If model_endpoint is omitted, the plugin connects to:
wss://api.slng.ai/v1/tts/{model}

Region override

TTS supports the same region_override option and forwards it to the gateway as X-Region-Override. See the full region list and override behavior at docs.slng.ai/region-override.

Voice selection

Pick a voice that matches your chosen model. See the Voices pages for what’s available per provider.

Provider notes

Sarvam Bulbul v3 TTS

Works out of the box. The plugin auto-normalizes language codes to BCP-47 on the wire — pass language="hi", the plugin sends "hi-IN" to Sarvam. To override the normalization (e.g. force a different target language), pass target_language_code="..." in model_options.

Sarvam Saaras v3 STT

Saaras on SLNG is HTTP-only (no WebSocket endpoint) and is therefore not supported by this plugin’s realtime streaming path. For Hindi STT in a voice agent, use slng/deepgram/nova:3-hi or slng/deepgram/nova:3-multi instead.

Rime Arcana

Requires a voice (speaker) that matches the chosen language. Passing voice="default" auto-resolves to a reasonable default per language.
The plugin outputs linear16 PCM audio internally and registers itself with LiveKit on import. Both STT and TTS authenticate with api_key.
Most new SLNG gateway models work without plugin updates, but providers with non-standard WebSocket message formats may require plugin support (for example, Sarvam Bulbul needed nested data.audio parsing).

Next steps

  • Browse available Models for STT and TTS
  • Check the Voices pages for voice options per provider
  • See Voice Agents for the SLNG-managed agents API