Documentation Index
Fetch the complete documentation index at: https://docs.slng.ai/llms.txt
Use this file to discover all available pages before exploring further.
livekit-plugins-slng adds STT and TTS adapters for LiveKit Agents. It lets you use any model on the SLNG gateway from within a LiveKit agent.
Prerequisites
Installation
pip install livekit-plugins-slng
Credentials
You need an SLNG API key. The plugin reads it from the SLNG_API_KEY environment variable automatically:
export SLNG_API_KEY="your-slng-api-key"
You can also pass it explicitly via api_key:
stt = slng.STT(api_key="your-slng-api-key", model="deepgram/nova:3")
slng.STT also accepts a legacy api_token= alias, but it is deprecated —
use api_key in new code.
Quickstart
Create an STT and TTS instance, then pass them to your LiveKit agent session:
from livekit.plugins import slng
stt = slng.STT(
api_key="your-slng-api-key",
model="deepgram/nova:3",
region_override="eu-west-1",
language="en",
)
tts = slng.TTS(
api_key="your-slng-api-key",
model="deepgram/aura:2",
region_override=["eu-west-1", "us-east-1"],
voice="aura-2-thalia-en",
language="en",
)
Region override
The plugin supports gateway region routing through the region_override option on both STT and TTS.
This maps directly to the gateway’s X-Region-Override header.
You can pass either a single region:
stt = slng.STT(
api_key="your-slng-api-key",
model="deepgram/nova:3",
region_override="eu-west-1",
)
Or multiple preferred regions in priority order:
tts = slng.TTS(
api_key="your-slng-api-key",
model="deepgram/aura:2",
voice="aura-2-thalia-en",
region_override=["eu-west-1", "us-east-1"],
)
See the full region list and override behavior at docs.slng.ai/region-override.
Full voice agent example
This example wires up STT, TTS, and VAD into a complete LiveKit agent that greets the user on join:
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import silero, slng
SLNG_API_KEY = "your-slng-api-key"
class MyAgent(Agent):
async def on_enter(self):
await self.session.say("Hello! How can I help?")
async def entrypoint(ctx: JobContext):
await ctx.connect()
stt = slng.STT(
api_key=SLNG_API_KEY,
model="deepgram/nova:3",
language="en",
sample_rate=16000,
enable_partial_transcripts=True,
)
tts = slng.TTS(
api_key=SLNG_API_KEY,
model="deepgram/aura:2",
voice="aura-2-thalia-en",
language="en",
sample_rate=24000,
)
# `turn_detection` and `allow_interruptions` still work in livekit-agents 1.5.x
# but will be removed in v2.0. Use `turn_handling=TurnHandlingOptions(...)` going forward.
session = AgentSession(
stt=stt,
tts=tts,
vad=silero.VAD.load(),
turn_detection="vad",
allow_interruptions=True,
)
await session.start(agent=MyAgent(), room=ctx.room)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Model identifiers
Models follow the format provider/model:variant. Prefix with slng/ to target an SLNG-hosted instance:
provider/model:variant # third-party passthrough
slng/provider/model:variant # SLNG-hosted
Examples:
model="deepgram/nova:3" # Deepgram Nova 3 (passthrough)
model="slng/deepgram/nova:3-en" # SLNG-hosted Deepgram Nova 3, English
model="elevenlabs/eleven-flash:2.5" # ElevenLabs Flash v2.5 (passthrough)
See the Models page for the full list of available models.
STT reference
slng.STT streams speech-to-text over WebSocket. It supports multi-endpoint failover.
Constructor
stt = slng.STT(
api_key="your-slng-api-key", # Required. SLNG API key. Falls back to SLNG_API_KEY env var.
model="deepgram/nova:3", # Model identifier. Default: "deepgram/nova:3"
language="en", # Language code. Default: "en"
sample_rate=16000, # Audio sample rate in Hz. Default: 16000
encoding="pcm_s16le", # Audio encoding: "pcm_s16le" or "pcm_mulaw". Default: "pcm_s16le"
buffer_size_seconds=0.064, # Audio buffer size in seconds. Default: 0.064
enable_partial_transcripts=True, # Enable interim results. Default: True
enable_diarization=False, # Enable speaker identification. Default: False
min_speakers=None, # Minimum speakers for diarization. Default: None
max_speakers=None, # Maximum speakers for diarization. Default: None
vad_threshold=0.5, # Voice activity detection threshold. Default: 0.5
vad_min_silence_duration_ms=300, # Minimum silence for VAD (ms). Default: 300
vad_speech_pad_ms=30, # Speech padding for VAD (ms). Default: 30
model_endpoint=None, # Optional explicit WebSocket endpoint URL
model_endpoints=None, # Optional list of failover endpoints
slng_base_url="api.slng.ai", # Gateway host override (self-hosted or staging)
region_override=None, # Optional region override, sent as X-Region-Override
http_session=None, # Optional reused aiohttp.ClientSession
# **model_options # Arbitrary model-specific kwargs (see below)
)
Any additional keyword arguments are forwarded as model-specific options — for example, whisper_params={"task": "translate"} for Whisper, or target_language_code="hi-IN" to override language normalization for Sarvam STT models.
Endpoint failover
Pass a list of endpoints to model_endpoints. If the first one fails, the plugin tries the next in order:
stt = slng.STT(
api_key=SLNG_API_KEY,
model_endpoints=[
"wss://api.slng.ai/v1/stt/deepgram/nova:3",
"wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en",
],
language="en",
)
Default endpoint
If model_endpoint is omitted, the plugin connects to:
wss://api.slng.ai/v1/stt/{model}
Region override
To force routing toward one or more preferred gateway regions, pass region_override.
This value is forwarded directly as the gateway X-Region-Override header.
stt = slng.STT(
api_key=SLNG_API_KEY,
model="deepgram/nova:3",
region_override=["eu-west-1", "us-east-1"],
)
See the full region list and override behavior at docs.slng.ai/region-override.
TTS reference
slng.TTS streams text-to-speech over WebSocket with connection pooling.
Constructor
tts = slng.TTS(
api_key="your-slng-api-key", # Required. SLNG API key. Falls back to SLNG_API_KEY env var.
model="deepgram/aura:2", # Model identifier. Default: "deepgram/aura:2"
voice="aura-2-thalia-en", # Voice identifier. Default: "default"
language="en", # Language code. Default: "en"
sample_rate=24000, # Audio sample rate in Hz. Default: 24000
speed=1.0, # Speech speed multiplier. Default: 1.0
model_endpoint=None, # Optional explicit WebSocket endpoint URL
slng_base_url="api.slng.ai", # Gateway host override (self-hosted or staging)
region_override=None, # Optional region override, sent as X-Region-Override
word_tokenizer=None, # Optional custom tokenize.WordTokenizer
http_session=None, # Optional reused aiohttp.ClientSession
# **model_options # Arbitrary model-specific kwargs (see below)
)
Additional keyword arguments are forwarded to the chosen model’s init payload. Known keys by provider:
- Rime Arcana:
modelId, segment, speakingStyle, addBreathing, addDisfluencies, phonemizeBetweenBrackets, translateTo.
- Sarvam Bulbul:
pace, temperature, output_audio_bitrate, min_buffer_size, max_chunk_length, target_language_code.
Streaming vs batch
tts.stream() sends text word-by-word and returns audio chunks in real time. Use this for voice agents.
tts.synthesize(text) does one-shot synthesis. Works fine for previews, but stream() is better for interactive agents.
Default endpoint
If model_endpoint is omitted, the plugin connects to:
wss://api.slng.ai/v1/tts/{model}
Region override
TTS supports the same region_override option and forwards it to the gateway as X-Region-Override.
See the full region list and override behavior at docs.slng.ai/region-override.
Voice selection
Pick a voice that matches your chosen model. See the Voices pages for what’s available per provider.
Provider notes
Sarvam Bulbul v3 TTS
Works out of the box. The plugin auto-normalizes language codes to BCP-47 on the wire — pass language="hi", the plugin sends "hi-IN" to Sarvam. To override the normalization (e.g. force a different target language), pass target_language_code="..." in model_options.
Sarvam Saaras v3 STT
Saaras on SLNG is HTTP-only (no WebSocket endpoint) and is therefore not supported by this plugin’s realtime streaming path. For Hindi STT in a voice agent, use slng/deepgram/nova:3-hi or slng/deepgram/nova:3-multi instead.
Rime Arcana
Requires a voice (speaker) that matches the chosen language. Passing voice="default" auto-resolves to a reasonable default per language.
The plugin outputs linear16 PCM audio internally and registers itself with
LiveKit on import. Both STT and TTS authenticate with api_key.
Most new SLNG gateway models work without plugin updates, but providers with
non-standard WebSocket message formats may require plugin support (for
example, Sarvam Bulbul needed nested data.audio parsing).
Next steps
- Browse available Models for STT and TTS
- Check the Voices pages for voice options per provider
- See Voice Agents for the SLNG-managed agents API