livekit-plugins-slng adds STT and TTS adapters for LiveKit Agents. It lets you use any model on the SLNG gateway from within a LiveKit agent.
Prerequisites
Installation
pip install livekit-plugins-slng
Credentials
You need an SLNG API key. The plugin reads it from the SLNG_API_KEY environment variable automatically:
export SLNG_API_KEY="your-slng-api-key"
You can also pass it explicitly via api_key:
stt = slng.STT(api_key="your-slng-api-key", model="deepgram/nova:3")
Quickstart
Create an STT and TTS instance, then pass them to your LiveKit agent session:
from livekit.plugins import slng
stt = slng.STT(
api_key="your-slng-api-key",
model="deepgram/nova:3",
language="en",
)
tts = slng.TTS(
api_key="your-slng-api-key",
model="deepgram/aura:2",
voice="aura-2-thalia-en",
language="en",
)
Full voice agent example
This example wires up STT, TTS, and VAD into a complete LiveKit agent that greets the user on join:
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import silero, slng
SLNG_API_KEY = "your-slng-api-key"
class MyAgent(Agent):
async def on_enter(self):
await self.session.say("Hello! How can I help?")
async def entrypoint(ctx: JobContext):
await ctx.connect()
stt = slng.STT(
api_key=SLNG_API_KEY,
model="deepgram/nova:3",
language="en",
sample_rate=16000,
enable_partial_transcripts=True,
)
tts = slng.TTS(
api_key=SLNG_API_KEY,
model="deepgram/aura:2",
voice="aura-2-thalia-en",
language="en",
sample_rate=24000,
)
session = AgentSession(
stt=stt,
tts=tts,
vad=silero.VAD.load(),
turn_detection="vad",
allow_interruptions=True,
)
await session.start(agent=MyAgent(), room=ctx.room)
if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Model identifiers
Models follow the format provider/model:variant. Prefix with slng/ to target an SLNG-hosted instance:
provider/model:variant # third-party passthrough
slng/provider/model:variant # SLNG-hosted
Examples:
model="deepgram/nova:3" # Deepgram Nova 3 (passthrough)
model="slng/deepgram/nova:3-en" # SLNG-hosted Deepgram Nova 3, English
model="elevenlabs/eleven-flash:2.5" # ElevenLabs Flash v2.5 (passthrough)
See the Models page for the full list of available models.
STT reference
slng.STT streams speech-to-text over WebSocket. It supports multi-endpoint failover.
Constructor
stt = slng.STT(
api_key="your-slng-api-key", # Required. SLNG API key.
model="deepgram/nova:3", # Model identifier. Default: "deepgram/nova:3"
language="en", # Language code. Default: "en"
sample_rate=16000, # Audio sample rate in Hz. Default: 16000
encoding="pcm_s16le", # Audio encoding: "pcm_s16le" or "pcm_mulaw". Default: "pcm_s16le"
buffer_size_seconds=0.064, # Audio buffer size in seconds. Default: 0.064
enable_partial_transcripts=True, # Enable interim results. Default: True
enable_diarization=False, # Enable speaker identification. Default: False
min_speakers=None, # Minimum speakers for diarization. Default: None
max_speakers=None, # Maximum speakers for diarization. Default: None
vad_threshold=0.5, # Voice activity detection threshold. Default: 0.5
vad_min_silence_duration_ms=300, # Minimum silence for VAD (ms). Default: 300
vad_speech_pad_ms=30, # Speech padding for VAD (ms). Default: 30
model_endpoint=None, # Optional explicit WebSocket endpoint URL
model_endpoints=None, # Optional list of failover endpoints
)
Endpoint failover
Pass a list of endpoints to model_endpoints. If the first one fails, the plugin tries the next in order:
stt = slng.STT(
api_key=SLNG_API_KEY,
model_endpoints=[
"wss://api.slng.ai/v1/stt/deepgram/nova:3",
"wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en",
],
language="en",
)
Default endpoint
If model_endpoint is omitted, the plugin connects to:
wss://api.slng.ai/v1/stt/{model}
TTS reference
slng.TTS streams text-to-speech over WebSocket with connection pooling.
Constructor
tts = slng.TTS(
api_key="your-slng-api-key", # Required. SLNG API key.
model="deepgram/aura:2", # Required. Model identifier.
voice="aura-2-thalia-en", # Voice identifier. Default: "default"
language="en", # Language code. Default: "en"
sample_rate=24000, # Audio sample rate in Hz. Default: 24000
speed=1.0, # Speech speed multiplier. Default: 1.0
model_endpoint=None, # Optional explicit WebSocket endpoint URL
)
Streaming vs batch
tts.stream() sends text word-by-word and returns audio chunks in real time. Use this for voice agents.
tts.synthesize(text) does one-shot synthesis. Works fine for previews, but stream() is better for interactive agents.
Default endpoint
If model_endpoint is omitted, the plugin connects to:
wss://api.slng.ai/v1/tts/{model}
Voice selection
Pick a voice that matches your chosen model. See the Voices pages for what’s available per provider.
The plugin outputs linear16 PCM audio internally and registers itself with
LiveKit on import. Both STT and TTS authenticate with api_key.
When new models are added to the SLNG gateway, you can use them right away
without updating the plugin.
Next steps
- Browse available Models for STT and TTS
- Check the Voices pages for voice options per provider
- See Voice Agents for the SLNG-managed agents API