Pipecat plugin for SLNG

pipecat-slng adds STT and TTS services for Pipecat. It routes your pipeline through the SLNG gateway, so you can use any STT or TTS model on SLNG — Deepgram, ElevenLabs, Rime, Sarvam, and more — behind one API key. Swap the model string to switch provider; no other code changes needed.

Tested with Pipecat v1.3.0. BYOK requires pipecat-slng 0.4.0 or later.

Prerequisites

Python 3.11+
pipecat-ai>=1.3.0
A Pipecat project
An SLNG API key (get one at app.slng.ai)

Installation

uv add pipecat-slng
# or
pip install pipecat-slng

Credentials

You need an SLNG API key. Read it from the SLNG_API_KEY environment variable:

export SLNG_API_KEY="your-slng-api-key"

Then pass it to each service via api_key:

import os

from pipecat_slng import SlngSTTService

stt = SlngSTTService(
    api_key=os.getenv("SLNG_API_KEY"),
    model="slng/deepgram/nova:3-en",
)

Quickstart

Create an STT and TTS service, then add them to your Pipecat pipeline:

import os

from pipecat_slng import SlngSTTService, SlngTTSService

stt = SlngSTTService(
    api_key=os.getenv("SLNG_API_KEY"),
    model="slng/deepgram/nova:3-en",
)

tts = SlngTTSService(
    api_key=os.getenv("SLNG_API_KEY"),
    model="slng/deepgram/aura:2-en",
    voice="aura-2-thalia-en",
)

SlngSTTService and SlngTTSService stream over WebSocket: low latency, with mid-utterance interruption support. Common runtime knobs are top-level keyword arguments (language, speed, enable_vad, enable_partials). For richer overrides, pass a SlngSTTSettings(...) or SlngTTSSettings(...) to settings=.

Region routing

Both services support gateway region routing. Pin requests to a specific datacenter with region_override, or constrain them to a broad geographic zone with world_part_override. When both are set, region_override wins.

stt = SlngSTTService(
    api_key=os.getenv("SLNG_API_KEY"),
    model="slng/deepgram/nova:3-en",
    region_override="eu-north-1",      # ap-southeast-2 | eu-north-1 | us-east-1
    world_part_override="eu",          # ap | eu | na
)

The WebSocket services send these as the X-Region-Override and X-World-Part-Override headers; the HTTP service (below) sends them as the region and world-part query parameters. See the full region list and override behavior at docs.slng.ai/region-override.

Bring your own key (BYOK)

If you already have a contract with an upstream provider, pass your own provider key via provider_key. All three services forward it as the X-Slng-Provider-Key header — on the WebSocket upgrade for the streaming services, on each request for SlngHttpTTSService — so the provider bills your account directly and no SLNG audio-minute fees apply, while the SLNG cache still applies on top. See Bring your own key for caching behavior and the supported provider list.

import os

from pipecat_slng import SlngSTTService, SlngTTSService

stt = SlngSTTService(
    api_key=os.getenv("SLNG_API_KEY"),
    model="deepgram/nova:3",            # external route — no slng/ prefix
    provider_key=os.getenv("SLNG_PROVIDER_KEY"),
)

tts = SlngTTSService(
    api_key=os.getenv("SLNG_API_KEY"),
    model="deepgram/aura:2",            # external route — no slng/ prefix
    voice="aura-2-thalia-en",
    provider_key=os.getenv("SLNG_PROVIDER_KEY"),
)

BYOK only works on external catalog routes — model strings without the slng/ prefix, such as deepgram/nova:3 or deepgram/aura:2. SLNG-hosted slng/... routes reject the header with an HTTP 400 (“BYOK is only supported for external STT/TTS routes”).

If the upstream provider rejects your key, the failure surfaces as a backend_connection_failed error frame over WebSocket, or as the upstream 401/403 with the X-Slng-Auth-Source: client_key response header over HTTP. Since pipecat-slng 0.4.0, WebSocket connect-rejection errors include the server’s response body, so a misrouted BYOK request reports the reason rather than a bare HTTP 400.

Full voice agent example

A complete cascade pipeline — Speech-to-Text → LLM → Text-to-Speech — using SLNG for STT and TTS and OpenAI for the LLM. The bot introduces itself when a client connects:

import os

from dotenv import load_dotenv

from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.frames.frames import LLMRunFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.llm_context import LLMContext
from pipecat.processors.aggregators.llm_response_universal import (
    LLMContextAggregatorPair,
    LLMUserAggregatorParams,
)
from pipecat.runner.types import RunnerArguments, SmallWebRTCRunnerArguments
from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService
from pipecat.transcriptions.language import Language
from pipecat.transports.base_transport import BaseTransport, TransportParams

from pipecat_slng import SlngSTTService, SlngTTSService

load_dotenv(override=True)


async def run_bot(transport: BaseTransport):
    slng_api_key = os.environ["SLNG_API_KEY"]

    stt = SlngSTTService(
        api_key=slng_api_key,
        model="slng/deepgram/nova:3-en",
        language=Language.EN,
        enable_vad=True,
        enable_partials=True,
        # region_override="eu-north-1",  # uncomment to pin to a datacenter
    )

    # Text-to-Speech (streaming WebSocket — low latency, supports interruption).
    # Deepgram Aura 2 supports `speed`; Rime / Sarvam don't (parameter-coverage
    # table on docs.slng.ai). Swap model= and voice= to change provider.
    tts = SlngTTSService(
        api_key=slng_api_key,
        model="slng/deepgram/aura:2-en",
        voice="aura-2-arcas-en",
        language=Language.EN,
        speed=1,
        # region_override="eu-north-1",
    )

    llm = OpenAIResponsesLLMService(
        api_key=os.getenv("OPENAI_API_KEY"),
        settings=OpenAIResponsesLLMService.Settings(
            model=os.getenv("OPENAI_MODEL", "gpt-4.1"),
            system_instruction=(
                "You are a helpful assistant in a voice conversation. "
                "Your responses will be spoken aloud, so avoid emojis, bullet points, "
                "or other formatting that can't be spoken."
            ),
        ),
    )

    context = LLMContext()
    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()),
    )

    pipeline = Pipeline(
        [
            transport.input(),
            stt,
            user_aggregator,
            llm,
            tts,
            transport.output(),
            assistant_aggregator,
        ]
    )

    task = PipelineTask(
        pipeline,
        params=PipelineParams(enable_metrics=True, enable_usage_metrics=True),
    )

    @task.rtvi.event_handler("on_client_ready")
    async def on_client_ready(rtvi):
        context.add_message({"role": "user", "content": "Please introduce yourself."})
        await task.queue_frames([LLMRunFrame()])

    runner = PipelineRunner(handle_sigint=False)
    await runner.run(task)


async def bot(runner_args: RunnerArguments):
    match runner_args:
        case SmallWebRTCRunnerArguments():
            from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport

            transport = SmallWebRTCTransport(
                webrtc_connection=runner_args.webrtc_connection,
                params=TransportParams(audio_in_enabled=True, audio_out_enabled=True),
            )
            await run_bot(transport)


if __name__ == "__main__":
    from pipecat.runner.run import main

    main()

The full example, including the Daily transport branch, lives in examples/bot.py. Run it with:

cp .env.example .env   # set SLNG_API_KEY and OPENAI_API_KEY
uv run --extra example examples/bot.py

Then open http://localhost:7860/client and start talking. It uses the SmallWebRTC transport by default; pass -t daily to use Daily instead (requires pipecat-ai[daily]). Setting SLNG_PROVIDER_KEY (your own Deepgram key) in .env flips the example into BYOK mode on the external deepgram/nova:3 / deepgram/aura:2 routes.

Model identifiers

Models follow the format provider/model:variant. Prefix with slng/ to target an SLNG-hosted instance, and suffix the language where the model exposes per-language variants:

model="slng/deepgram/nova:3-en"      # SLNG-hosted Deepgram Nova 3, English (STT)
model="slng/deepgram/aura:2-en"      # SLNG-hosted Deepgram Aura 2, English (TTS)

Model strings without the slng/ prefix are external routes, proxied to the provider’s own API:

model="deepgram/nova:3"              # Proxied to Deepgram Nova 3 (STT)
model="deepgram/aura:2"              # Proxied to Deepgram Aura 2 (TTS)

External routes are required for BYOK. The plugin routes through the SLNG Unmute bridge, so the full list of models you can pass to model= is the bridge’s supported-models list — see Supported models. Not every model accepts every option (for example speed on TTS); check the parameter coverage table before tuning.

STT reference

SlngSTTService streams speech-to-text over WebSocket, connecting to wss://api.slng.ai/v1/bridges/unmute/stt/{model}.

Constructor

stt = SlngSTTService(
    api_key="your-slng-api-key",          # Required. SLNG API key.
    model="slng/deepgram/nova:3-en",      # Model identifier. Default: "slng/deepgram/nova:3-en"
    base_url="api.slng.ai",               # Gateway host (self-hosted or staging). Default: "api.slng.ai"
    encoding="linear16",                   # "linear16", "mp3", or "opus". Default: "linear16"
    sample_rate=None,                      # Audio sample rate in Hz. Default: the pipeline sample rate
    language=Language.EN,                  # Recognition language. Default: English
    enable_vad=True,                       # Enable server-side VAD. Default: True
    enable_partials=True,                  # Stream interim (partial) transcripts. Default: True
    region_override=None,                  # Pin to a datacenter, sent as X-Region-Override
    world_part_override=None,              # Constrain to a zone, sent as X-World-Part-Override
    provider_key=None,                     # Your own provider key (BYOK), sent as X-Slng-Provider-Key — external routes only
    settings=None,                         # Optional SlngSTTSettings for runtime updates
)

Language is imported from pipecat.transcriptions.language.

Confidence filter

When the provider surfaces a confidence score, transcripts below 0.5 are dropped before reaching your pipeline.

Default endpoint

The plugin connects to:

wss://api.slng.ai/v1/bridges/unmute/stt/{model}

TTS reference (streaming)

SlngTTSService streams text-to-speech over WebSocket, connecting to wss://api.slng.ai/v1/bridges/unmute/tts/{model}. This is the recommended path for interactive voice agents.

Constructor

tts = SlngTTSService(
    api_key="your-slng-api-key",          # Required. SLNG API key.
    model="slng/deepgram/aura:2-en",      # Model identifier. Default: "slng/deepgram/aura:2-en"
    voice="aura-2-thalia-en",             # Voice identifier. Default: None (server default)
    base_url="api.slng.ai",               # Gateway host. Default: "api.slng.ai"
    encoding="linear16",                   # "linear16", "mp3", "opus", "mulaw", or "alaw". Default: "linear16"
    sample_rate=None,                      # Audio sample rate in Hz. Default: the pipeline sample rate
    language=Language.EN,                  # Synthesis language. Default: English
    speed=None,                            # Speech speed multiplier. Default: None (server default)
    region_override=None,                  # Pin to a datacenter, sent as X-Region-Override
    world_part_override=None,              # Constrain to a zone, sent as X-World-Part-Override
    provider_key=None,                     # Your own provider key (BYOK), sent as X-Slng-Provider-Key — external routes only
    settings=None,                         # Optional SlngTTSSettings for runtime updates
)

Runtime settings updates

Changing voice, speed, or language mid-session (via Pipecat settings updates) reconnects the WebSocket to re-run the init handshake. Expect a brief reconnect, not a silent no-op.

Default endpoint

The plugin connects to:

wss://api.slng.ai/v1/bridges/unmute/tts/{model}

Voice selection

Pick a voice that matches your chosen model. See the Voices pages for what’s available per provider.

HTTP TTS (non-streaming fallback)

For simple request/response synthesis where streaming is not required, use SlngHttpTTSService. It issues one HTTP POST per utterance and returns the full audio body in a single frame.

import os

from pipecat_slng import SlngHttpTTSService

tts = SlngHttpTTSService(
    api_key=os.getenv("SLNG_API_KEY"),
    model="slng/deepgram/aura:2-en",
    voice="aura-2-thalia-en",
)

The HTTP bridge body accepts only {text, voice} — there is no config object. Encoding, sample_rate, language, and speed are therefore not configurable over HTTP; the server returns its default audio format. language and speed are kept for API parity with the WebSocket service but are not sent over the wire.

The service auto-detects WAV (decoded to raw PCM at the file’s sample rate) and plain PCM (passed through at the pipeline’s sample rate). Compressed responses (MP3/Ogg) yield an ErrorFrame — use the streaming SlngTTSService if you need codec control. Pass aiohttp_session= to reuse a shared aiohttp.ClientSession; otherwise one is created internally. Region routing on the HTTP service uses the region and world-part query parameters instead of headers. BYOK works here too: pass provider_key and it is sent as the X-Slng-Provider-Key header on each request (external routes only).

Good to know

Both WebSocket services output linear16 PCM by default and authenticate with api_key. The package exports SlngSTTService, SlngTTSService, and SlngHttpTTSService, plus the SlngSTTSettings and SlngTTSSettings settings classes.

Prefer the streaming SlngTTSService for conversational agents — it supports mid-utterance interruption. Reserve SlngHttpTTSService for batch or non-interactive synthesis.

Next steps

Browse the supported models and parameter coverage for the Unmute bridge
Read Bring your own key for BYOK caching behavior and supported providers
Check the Voices pages for voice options per provider
See Voice Agents for the SLNG-managed agents API
Using LiveKit instead? See the LiveKit plugin

Get started

Text-to-Speech

Speech-to-Text

Voice Agents

Integrations

SDKs & Tools

Reference

Prerequisites

Installation

Credentials

Quickstart

Region routing

Bring your own key (BYOK)

Full voice agent example

Model identifiers

STT reference

Constructor

Confidence filter

Default endpoint

TTS reference (streaming)

Constructor

Runtime settings updates

Default endpoint

Voice selection

HTTP TTS (non-streaming fallback)

Good to know

Next steps

​Prerequisites

​Installation

​Credentials

​Quickstart

​Region routing

​Bring your own key (BYOK)

​Full voice agent example

​Model identifiers

​STT reference

​Constructor

​Confidence filter

​Default endpoint

​TTS reference (streaming)

​Constructor

​Runtime settings updates

​Default endpoint

​Voice selection

​HTTP TTS (non-streaming fallback)

​Good to know

​Next steps

Prerequisites

Installation

Credentials

Quickstart

Region routing

Bring your own key (BYOK)

Full voice agent example

Model identifiers

STT reference

Constructor

Confidence filter

Default endpoint

TTS reference (streaming)

Constructor

Runtime settings updates

Default endpoint

Voice selection

HTTP TTS (non-streaming fallback)

Good to know

Next steps