pipecat-slng adds STT and TTS services for Pipecat. It routes
your pipeline through the SLNG gateway, so you can use any STT or TTS model on SLNG — Deepgram,
ElevenLabs, Rime, Sarvam, and more — behind one API key. Swap the model string to switch provider; no
other code changes needed.
Tested with Pipecat v1.3.0.
Prerequisites
Installation
uv add pipecat-slng
# or
pip install pipecat-slng
Credentials
You need an SLNG API key. Read it from the SLNG_API_KEY environment variable:
export SLNG_API_KEY="your-slng-api-key"
Then pass it to each service via api_key:
import os
from pipecat_slng import SlngSTTService
stt = SlngSTTService(
api_key=os.getenv("SLNG_API_KEY"),
model="slng/deepgram/nova:3-en",
)
Quickstart
Create an STT and TTS service, then add them to your Pipecat pipeline:
import os
from pipecat_slng import SlngSTTService, SlngTTSService
stt = SlngSTTService(
api_key=os.getenv("SLNG_API_KEY"),
model="slng/deepgram/nova:3-en",
)
tts = SlngTTSService(
api_key=os.getenv("SLNG_API_KEY"),
model="slng/deepgram/aura:2-en",
voice="aura-2-thalia-en",
)
SlngSTTService and SlngTTSService stream over WebSocket: low latency, with mid-utterance
interruption support. Common runtime knobs are top-level keyword arguments (language, speed,
enable_vad, enable_partials). For richer overrides, pass a SlngSTTSettings(...) or
SlngTTSSettings(...) to settings=.
Region routing
Both services support gateway region routing. Pin requests to a specific datacenter with
region_override, or constrain them to a broad geographic zone with world_part_override. When both
are set, region_override wins.
stt = SlngSTTService(
api_key=os.getenv("SLNG_API_KEY"),
model="slng/deepgram/nova:3-en",
region_override="eu-north-1", # ap-southeast-2 | eu-north-1 | us-east-1
world_part_override="eu", # ap | eu | na
)
The WebSocket services send these as the X-Region-Override and X-World-Part-Override headers; the
HTTP service (below) sends them as the region and world-part query parameters.
See the full region list and override behavior at docs.slng.ai/region-override.
Full voice agent example
A complete cascade pipeline — Speech-to-Text → LLM → Text-to-Speech — using SLNG for STT and TTS and
OpenAI for the LLM. The bot introduces itself when a client connects:
import os
from dotenv import load_dotenv
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
LLMContextAggregatorPair,
LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments, SmallWebRTCRunnerArguments
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams
from pipecat_slng import SlngSTTService, SlngTTSService
load_dotenv(override=True)
async def run_bot(transport: BaseTransport):
slng_api_key = os.environ["SLNG_API_KEY"]
stt = SlngSTTService(
api_key=slng_api_key,
model="slng/deepgram/nova:3-en",
language=Language.EN,
enable_vad=True,
enable_partials=True,
# region_override="eu-north-1", # uncomment to pin to a datacenter
)
# Text-to-Speech (streaming WebSocket — low latency, supports interruption).
# Deepgram Aura 2 supports `speed`; Rime / Sarvam don't (parameter-coverage
# table on docs.slng.ai). Swap model= and voice= to change provider.
tts = SlngTTSService(
api_key=slng_api_key,
model="slng/deepgram/aura:2-en",
voice="aura-2-arcas-en",
language=Language.EN,
speed=1,
# region_override="eu-north-1",
)
llm = OpenAIResponsesLLMService(
api_key=os.getenv("OPENAI_API_KEY"),
settings=OpenAIResponsesLLMService.Settings(
model=os.getenv("OPENAI_MODEL", "gpt-4.1"),
system_instruction=(
"You are a helpful assistant in a voice conversation. "
"Your responses will be spoken aloud, so avoid emojis, bullet points, "
"or other formatting that can't be spoken."
),
),
)
context = LLMContext()
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
)
pipeline = Pipeline(
[
transport.input(),
stt,
user_aggregator,
llm,
tts,
transport.output(),
assistant_aggregator,
]
)
task = PipelineTask(
pipeline,
params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
)
@task.rtvi.event_handler("on_client_ready")
async def on_client_ready(rtvi):
context.add_message({"role": "user", "content": "Please introduce yourself."})
await task.queue_frames([LLMRunFrame()])
runner = PipelineRunner(handle_sigint=False)
await runner.run(task)
async def bot(runner_args: RunnerArguments):
match runner_args:
case SmallWebRTCRunnerArguments():
from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport
transport = SmallWebRTCTransport(
webrtc_connection=runner_args.webrtc_connection,
params=TransportParams(audio_in_enabled=True, audio_out_enabled=True),
)
await run_bot(transport)
if __name__ == "__main__":
from pipecat.runner.run import main
main()
The full example, including the Daily transport branch, lives in
examples/bot.py. Run it with:
cp .env.example .env # set SLNG_API_KEY and OPENAI_API_KEY
uv run --extra example examples/bot.py
Then open http://localhost:7860/client and start talking. It uses the SmallWebRTC transport by
default; pass -t daily to use Daily instead (requires pipecat-ai[daily]).
Model identifiers
Models follow the format provider/model:variant. Prefix with slng/ to target an SLNG-hosted instance,
and suffix the language where the model exposes per-language variants:
model="slng/deepgram/nova:3-en" # SLNG-hosted Deepgram Nova 3, English (STT)
model="slng/deepgram/aura:2-en" # SLNG-hosted Deepgram Aura 2, English (TTS)
The plugin routes through the SLNG Unmute bridge, so the full list of models you can pass to model= is
the bridge’s supported-models list — see Supported models. Not every
model accepts every option (for example speed on TTS); check the
parameter coverage table before tuning.
STT reference
SlngSTTService streams speech-to-text over WebSocket, connecting to
wss://api.slng.ai/v1/bridges/unmute/stt/{model}.
Constructor
stt = SlngSTTService(
api_key="your-slng-api-key", # Required. SLNG API key.
model="slng/deepgram/nova:3-en", # Model identifier. Default: "slng/deepgram/nova:3-en"
base_url="api.slng.ai", # Gateway host (self-hosted or staging). Default: "api.slng.ai"
encoding="linear16", # "linear16", "mp3", or "opus". Default: "linear16"
sample_rate=None, # Audio sample rate in Hz. Default: the pipeline sample rate
language=Language.EN, # Recognition language. Default: English
enable_vad=True, # Enable server-side VAD. Default: True
enable_partials=True, # Stream interim (partial) transcripts. Default: True
region_override=None, # Pin to a datacenter, sent as X-Region-Override
world_part_override=None, # Constrain to a zone, sent as X-World-Part-Override
settings=None, # Optional SlngSTTSettings for runtime updates
)
Language is imported from pipecat.transcriptions.language.
Confidence filter
When the provider surfaces a confidence score, transcripts below 0.5 are dropped before reaching your
pipeline.
Default endpoint
The plugin connects to:
wss://api.slng.ai/v1/bridges/unmute/stt/{model}
TTS reference (streaming)
SlngTTSService streams text-to-speech over WebSocket, connecting to
wss://api.slng.ai/v1/bridges/unmute/tts/{model}. This is the recommended path for interactive voice
agents.
Constructor
tts = SlngTTSService(
api_key="your-slng-api-key", # Required. SLNG API key.
model="slng/deepgram/aura:2-en", # Model identifier. Default: "slng/deepgram/aura:2-en"
voice="aura-2-thalia-en", # Voice identifier. Default: None (server default)
base_url="api.slng.ai", # Gateway host. Default: "api.slng.ai"
encoding="linear16", # "linear16", "mp3", "opus", "mulaw", or "alaw". Default: "linear16"
sample_rate=None, # Audio sample rate in Hz. Default: the pipeline sample rate
language=Language.EN, # Synthesis language. Default: English
speed=None, # Speech speed multiplier. Default: None (server default)
region_override=None, # Pin to a datacenter, sent as X-Region-Override
world_part_override=None, # Constrain to a zone, sent as X-World-Part-Override
settings=None, # Optional SlngTTSSettings for runtime updates
)
Runtime settings updates
Changing voice, speed, or language mid-session (via Pipecat settings updates) reconnects the
WebSocket to re-run the init handshake. Expect a brief reconnect, not a silent no-op.
Default endpoint
The plugin connects to:
wss://api.slng.ai/v1/bridges/unmute/tts/{model}
Voice selection
Pick a voice that matches your chosen model. See the Voices pages for what’s
available per provider.
HTTP TTS (non-streaming fallback)
For simple request/response synthesis where streaming is not required, use SlngHttpTTSService. It
issues one HTTP POST per utterance and returns the full audio body in a single frame.
import os
from pipecat_slng import SlngHttpTTSService
tts = SlngHttpTTSService(
api_key=os.getenv("SLNG_API_KEY"),
model="slng/deepgram/aura:2-en",
voice="aura-2-thalia-en",
)
The HTTP bridge body accepts only {text, voice} — there is no config object. Encoding,
sample_rate, language, and speed are therefore not configurable over HTTP; the server
returns its default audio format. language and speed are kept for API parity with the WebSocket
service but are not sent over the wire.
The service auto-detects WAV (decoded to raw PCM at the file’s sample rate) and plain PCM (passed
through at the pipeline’s sample rate). Compressed responses (MP3/Ogg) yield an ErrorFrame — use the
streaming SlngTTSService if you need codec control. Pass aiohttp_session= to reuse a shared
aiohttp.ClientSession; otherwise one is created internally.
Region routing on the HTTP service uses the region and world-part query parameters instead of
headers.
Good to know
Both WebSocket services output linear16 PCM by default and authenticate with api_key. The package
exports SlngSTTService, SlngTTSService, and SlngHttpTTSService, plus the SlngSTTSettings and
SlngTTSSettings settings classes.
Prefer the streaming SlngTTSService for conversational agents — it supports mid-utterance
interruption. Reserve SlngHttpTTSService for batch or non-interactive synthesis.
Next steps