# Embed a SLNG voice agent on your website Source: https://docs.slng.ai/agents/embed-web Add a browser-based voice session with a SLNG voice agent to any web page using LiveKit, a React frontend, and a backend proxy that hides your SLNG key. You can let visitors talk to an SLNG voice agent directly in the browser. You need a small backend to create the web session (keeps your SLNG key off the client) and a frontend that connects to LiveKit for real-time audio. ## Placeholders The snippets below use these placeholders. Replace them with your own values before running the code. | Placeholder | Replace with | | --------------- | --------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys) | | `SLNG_AGENT_ID` | The ID of a configured [Voice Agent](/voice-agents) | ## Prerequisites * A configured [Voice Agent](/voice-agents) with its agent ID * An SLNG key (get one at [app.slng.ai](https://app.slng.ai/api-keys)) * A backend you can deploy server-side code to (Node.js, Deno, Python, etc.) * A frontend project with React (the examples below use React, but the LiveKit client SDK works with any framework) ## How it works ```mermaid theme={null} sequenceDiagram participant Browser participant Backend as Your backend participant SLNG as Voice Agents API participant LK as LiveKit room Browser->>Backend: POST /api/session Backend->>SLNG: POST /v1/agents/SLNG_AGENT_ID/web-sessions SLNG-->>Backend: livekit_url + livekit_token Backend-->>Browser: livekit_url + livekit_token Browser->>LK: Connect (token) Browser->>LK: Publish mic audio LK-->>Browser: Agent audio + transcripts ``` 1. The browser asks your backend to start a session. 2. Your backend calls the Voice Agents web-sessions endpoint and forwards the LiveKit credentials back. 3. The browser connects to the LiveKit room, publishes the mic, and plays the agent's audio. ## Step 1: Create a backend endpoint Your backend proxies the Voice Agents API so the SLNG key never reaches the browser. Never call the Voice Agents API directly from client-side code. Your `SLNG_API_KEY` must stay server-side. The only call you need is: ``` POST https://api.agents.slng.ai/v1/agents/SLNG_AGENT_ID/web-sessions ``` ```javascript Node.js (Express) theme={null} import express from "express"; const app = express(); app.use(express.json()); const SLNG_API_KEY = process.env.SLNG_API_KEY; const SLNG_AGENT_ID = process.env.SLNG_AGENT_ID; app.post("/api/session", async (req, res) => { const response = await fetch( `https://api.agents.slng.ai/v1/agents/${SLNG_AGENT_ID}/web-sessions`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${SLNG_API_KEY}`, }, body: JSON.stringify({ arguments: {}, ...(req.body.participant_name ? { participant_name: req.body.participant_name } : {}), }), } ); const data = await response.json(); if (!response.ok) { return res.status(response.status).json({ error: "SLNG error", details: data }); } res.json(data); }); app.listen(3001); ``` ```python Python (FastAPI) theme={null} import os import httpx from fastapi import FastAPI from pydantic import BaseModel app = FastAPI() SLNG_API_KEY = os.environ["SLNG_API_KEY"] SLNG_AGENT_ID = os.environ["SLNG_AGENT_ID"] class SessionRequest(BaseModel): participant_name: str | None = None @app.post("/api/session") async def create_session(body: SessionRequest): async with httpx.AsyncClient() as client: payload = {"arguments": {}} if body.participant_name: payload["participant_name"] = body.participant_name r = await client.post( f"https://api.agents.slng.ai/v1/agents/{SLNG_AGENT_ID}/web-sessions", headers={ "Content-Type": "application/json", "Authorization": f"Bearer {SLNG_API_KEY}", }, json=payload, ) return r.json() ``` The response includes the fields you need for the frontend: ```json theme={null} { "livekit_url": "wss://...", "livekit_token": "...", "call_id": "...", "max_session_seconds": 300 } ``` ## Step 2: Install the LiveKit client SDK ```bash theme={null} npm install livekit-client ``` ## Step 3: Connect to the LiveKit room Call your backend to get a session, then connect to the room: ```tsx theme={null} import { Room, RoomEvent, createLocalAudioTrack } from "livekit-client"; async function startSession() { // 1. Get session credentials from your backend const res = await fetch("/api/session", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ participant_name: "visitor" }), }); const session = await res.json(); // 2. Create and connect to the LiveKit room const room = new Room({ adaptiveStream: true, dynacast: true }); await room.connect(session.livekit_url, session.livekit_token); // 3. Publish your microphone const micTrack = await createLocalAudioTrack(); await room.localParticipant.publishTrack(micTrack); return { room, micTrack, session }; } ``` The browser will prompt the user for microphone access on `createLocalAudioTrack()`. If your page is not served over HTTPS, most browsers will block the request. ## Step 4: Play the agent's audio Attach the agent's remote audio track to the DOM so the browser plays it: ```tsx theme={null} room.on(RoomEvent.TrackSubscribed, (track) => { if (track.kind !== "audio") return; const el = track.attach(); el.autoplay = true; document.body.appendChild(el); }); room.on(RoomEvent.TrackUnsubscribed, (track) => { track.detach().forEach((el) => el.remove()); }); ``` ## Step 5: Show live transcripts Transcript updates arrive over a LiveKit data channel on the `slng.transcript.v1` topic: ```tsx theme={null} const TRANSCRIPT_TOPIC = "slng.transcript.v1"; room.on(RoomEvent.DataReceived, (payload, _participant, _kind, topic) => { if (topic !== TRANSCRIPT_TOPIC) return; const msg = JSON.parse(new TextDecoder().decode(payload)); if (msg.type === "conversation_item_added" && msg.item) { const { id, role, content, created_at } = msg.item; const text = Array.isArray(content) ? content.join("\n") : String(content); // Append { id, role, content: text, created_at } to your transcript state } }); ``` Each transcript item has: | Field | Description | | ------------ | ------------------------------------------------- | | `id` | Unique message ID (use to deduplicate) | | `role` | `"user"` or `"assistant"` | | `content` | The transcribed text (string or array of strings) | | `created_at` | Timestamp | ## Step 6: Add mute and disconnect controls ```tsx theme={null} // Toggle mute async function toggleMute(micTrack, muted) { if (muted) { await micTrack.unmute(); } else { await micTrack.mute(); } return !muted; } // End the session async function disconnect(room, micTrack) { micTrack.stop(); await room.disconnect(); } ``` ## Step 7: Detect who is speaking The active-speakers event tells you when the agent is talking, so you can drive a visual indicator or avatar animation: ```tsx theme={null} room.on(RoomEvent.ActiveSpeakersChanged, (speakers) => { const agentIsSpeaking = speakers.some( (p) => p.identity !== room.localParticipant.identity ); // Update your UI based on agentIsSpeaking }); ``` ## Optional: Add a visual persona A voice-only interface gives users no visual cue about what the agent is doing. Adding an animated persona (an orb, waveform, or avatar) makes the experience feel more responsive. Two ready-made libraries work well here: A React component with built-in states: `idle`, `listening`, `speaking`, `thinking`. Drop it in and map LiveKit events to states. Orb and avatar components designed for voice interfaces, with audio-reactive animations. To wire either library up, map your session and LiveKit events to persona states: ```tsx theme={null} // Derive a persona state from your session + LiveKit events function getPersonaState({ status, muted, agentIsSpeaking }) { if (status === "connecting") return "idle"; if (status === "ended") return "idle"; if (agentIsSpeaking) return "speaking"; if (muted) return "thinking"; return "listening"; } // Update on active-speaker changes (Step 7) room.on(RoomEvent.ActiveSpeakersChanged, (speakers) => { const agentIsSpeaking = speakers.some( (p) => p.identity !== room.localParticipant.identity ); setPersonaState(getPersonaState({ status, muted, agentIsSpeaking })); }); ``` ## Putting it all together A minimal React component with all the steps above wired together: ```tsx theme={null} import { useEffect, useRef, useState, useCallback } from "react"; import { Room, RoomEvent, createLocalAudioTrack, type LocalAudioTrack, type RemoteTrack, } from "livekit-client"; const TRANSCRIPT_TOPIC = "slng.transcript.v1"; interface SessionData { livekit_url: string; livekit_token: string; call_id: string; max_session_seconds: number; } interface TranscriptItem { id: string; role: "user" | "assistant"; content: string; } export default function VoiceSession() { const [status, setStatus] = useState<"idle" | "connecting" | "active" | "ended">("idle"); const [muted, setMuted] = useState(false); const [transcript, setTranscript] = useState([]); const roomRef = useRef(null); const micRef = useRef(null); const start = useCallback(async () => { setStatus("connecting"); // Get session from your backend const res = await fetch("/api/session", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({}), }); const session: SessionData = await res.json(); const room = new Room({ adaptiveStream: true, dynacast: true }); roomRef.current = room; // Play agent audio room.on(RoomEvent.TrackSubscribed, (track: RemoteTrack) => { if (track.kind !== "audio") return; const el = track.attach() as HTMLAudioElement; el.autoplay = true; document.body.appendChild(el); }); room.on(RoomEvent.TrackUnsubscribed, (track: RemoteTrack) => { track.detach().forEach((el) => el.remove()); }); // Transcript room.on( RoomEvent.DataReceived, (payload: Uint8Array, _p: unknown, _k: unknown, topic?: string) => { if (topic !== TRANSCRIPT_TOPIC) return; try { const msg = JSON.parse(new TextDecoder().decode(payload)); if (msg.type === "conversation_item_added" && msg.item) { const { id, role, content } = msg.item; const text = Array.isArray(content) ? content.join("\n") : String(content); setTranscript((prev) => prev.find((t) => t.id === id) ? prev : [...prev, { id, role, content: text }] ); } } catch { // ignore malformed messages } } ); room.on(RoomEvent.Disconnected, () => setStatus("ended")); await room.connect(session.livekit_url, session.livekit_token); const micTrack = await createLocalAudioTrack(); micRef.current = micTrack; await room.localParticipant.publishTrack(micTrack); setStatus("active"); }, []); const toggleMute = async () => { if (!micRef.current) return; muted ? await micRef.current.unmute() : await micRef.current.mute(); setMuted((m) => !m); }; const disconnect = useCallback(async () => { micRef.current?.stop(); micRef.current = null; await roomRef.current?.disconnect(); roomRef.current = null; setStatus("ended"); }, []); // Cleanup on unmount useEffect(() => { return () => { micRef.current?.stop(); roomRef.current?.disconnect(); }; }, []); return (
{status === "idle" && } {status === "connecting" &&

Connecting…

} {status === "active" && (
)} {status === "ended" &&

Session ended.

}
    {transcript.map((item) => (
  • {item.role}: {item.content}
  • ))}
); } ``` ## Next steps * Set up your agent with custom prompts and tools in the [Dashboard](/dashboard/agent-infra) * See the [Voice Agents API](/voice-agents) for agent configuration options * Add phone call support with [Telephony](/dashboard/telephony) * Add a visual persona with [Vercel AI SDK](https://elements.ai-sdk.dev/components/persona) or [ElevenLabs UI](https://ui.elevenlabs.io/docs/components) # LiveKit plugin for SLNG Source: https://docs.slng.ai/agents/livekit-plugin Use the livekit-plugins-slng Python package to connect LiveKit Agents to any STT or TTS model on the SLNG platform with a single configuration switch. `livekit-plugins-slng` adds STT and TTS adapters for [LiveKit Agents](https://docs.livekit.io/agents/). It lets you use any model on the SLNG platform from within a LiveKit agent. ## Prerequisites * Python 3.10+ * `livekit-agents>=1.5.1` * A [LiveKit Agents](https://docs.livekit.io/agents/) project * An SLNG key (get one at [app.slng.ai](https://app.slng.ai/api-keys)) ## Installation ```bash theme={null} uv add livekit-plugins-slng # or pip install livekit-plugins-slng ``` ## Credentials You need an SLNG key. The plugin reads it from the `SLNG_API_KEY` environment variable automatically: ```bash theme={null} export SLNG_API_KEY="your-slng-api-key" ``` You can also pass it explicitly via `api_key`: ```python theme={null} stt = slng.STT(api_key="your-slng-api-key", model="deepgram/nova:3") ``` `slng.STT` also accepts a legacy `api_token=` alias, but it is deprecated. Use `api_key` in new code. ## Quickstart Create an STT and TTS instance, then pass them to your LiveKit agent session: ```python theme={null} from livekit.plugins import slng stt = slng.STT( api_key="your-slng-api-key", model="deepgram/nova:3", region_override="eu-north-1", language="en", ) tts = slng.TTS( api_key="your-slng-api-key", model="deepgram/aura:2", region_override=["eu-north-1", "us-east-1"], voice="aura-2-thalia-en", language="en", ) ``` ## Region override The plugin supports platform region routing through the `region_override` option on both `STT` and `TTS`. This maps directly to the platform's `X-Region-Override` header. You can pass either a single region: ```python theme={null} stt = slng.STT( api_key="your-slng-api-key", model="deepgram/nova:3", region_override="eu-north-1", ) ``` Or multiple preferred regions in priority order: ```python theme={null} tts = slng.TTS( api_key="your-slng-api-key", model="deepgram/aura:2", voice="aura-2-thalia-en", region_override=["eu-north-1", "us-east-1"], ) ``` See the full region list and override behavior at [docs.slng.ai/region-override](https://docs.slng.ai/region-override). ## Full voice agent example This example wires up STT, TTS, and VAD into a complete LiveKit agent that greets the user on join: ```python theme={null} from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli from livekit.plugins import silero, slng SLNG_API_KEY = "your-slng-api-key" class MyAgent(Agent): async def on_enter(self): await self.session.say("Hello! How can I help?") async def entrypoint(ctx: JobContext): await ctx.connect() stt = slng.STT( api_key=SLNG_API_KEY, model="deepgram/nova:3", language="en", sample_rate=16000, enable_partial_transcripts=True, ) tts = slng.TTS( api_key=SLNG_API_KEY, model="deepgram/aura:2", voice="aura-2-thalia-en", language="en", sample_rate=24000, ) # `turn_detection` and `allow_interruptions` still work in livekit-agents 1.5.x # but will be removed in v2.0. Use `turn_handling=TurnHandlingOptions(...)` going forward. session = AgentSession( stt=stt, tts=tts, vad=silero.VAD.load(), turn_detection="vad", allow_interruptions=True, ) await session.start(agent=MyAgent(), room=ctx.room) if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ``` ## Model identifiers Models follow the format `provider/model:variant`. Prefix with `slng/` to target an SLNG-hosted instance: ``` provider/model:variant # third-party passthrough slng/provider/model:variant # SLNG-hosted ``` Examples: ```python theme={null} model="deepgram/nova:3" # Deepgram Nova 3 (passthrough) model="slng/deepgram/nova:3-en" # SLNG-hosted Deepgram Nova 3, English model="elevenlabs/eleven-flash:2.5" # ElevenLabs Flash v2.5 (passthrough) ``` See the [Models](/models/index) page for the full list of available models. ## STT reference `slng.STT` streams speech-to-text over WebSocket. It supports multi-endpoint failover. ### Constructor ```python theme={null} stt = slng.STT( api_key="your-slng-api-key", # Required. SLNG API key. Falls back to SLNG_API_KEY env var. model="deepgram/nova:3", # Model identifier. Default: "deepgram/nova:3" language="en", # Language code. Default: "en" sample_rate=16000, # Audio sample rate in Hz. Default: 16000 encoding="pcm_s16le", # Audio encoding: "pcm_s16le" or "pcm_mulaw". Default: "pcm_s16le" buffer_size_seconds=0.064, # Audio buffer size in seconds. Default: 0.064 enable_partial_transcripts=True, # Enable interim results. Default: True enable_diarization=False, # Enable speaker identification. Default: False min_speakers=None, # Minimum speakers for diarization. Default: None max_speakers=None, # Maximum speakers for diarization. Default: None vad_threshold=0.5, # Voice activity detection threshold. Default: 0.5 vad_min_silence_duration_ms=300, # Minimum silence for VAD (ms). Default: 300 vad_speech_pad_ms=30, # Speech padding for VAD (ms). Default: 30 model_endpoint=None, # Optional explicit WebSocket endpoint URL model_endpoints=None, # Optional list of failover endpoints slng_base_url="api.slng.ai", # Gateway host override (self-hosted or staging) region_override=None, # Optional region override, sent as X-Region-Override http_session=None, # Optional reused aiohttp.ClientSession # **model_options # Arbitrary model-specific kwargs (see below) ) ``` Any additional keyword arguments are forwarded as model-specific options. For example, `target_language_code="hi-IN"` to override language normalization for Sarvam STT models, or `enable_diarization=True` for Deepgram Nova models. ### Endpoint failover Pass a list of endpoints to `model_endpoints`. If the first one fails, the plugin tries the next in order: ```python theme={null} stt = slng.STT( api_key=SLNG_API_KEY, model_endpoints=[ "wss://api.slng.ai/v1/stt/deepgram/nova:3", "wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en", ], language="en", ) ``` ### Default endpoint If `model_endpoint` is omitted, the plugin connects to: ``` wss://api.slng.ai/v1/stt/{model} ``` ### Region override To force routing toward one or more preferred platform regions, pass `region_override`. This value is forwarded directly as the platform `X-Region-Override` header. ```python theme={null} stt = slng.STT( api_key=SLNG_API_KEY, model="deepgram/nova:3", region_override=["eu-north-1", "us-east-1"], ) ``` See the full region list and override behavior at [docs.slng.ai/region-override](https://docs.slng.ai/region-override). ## TTS reference `slng.TTS` streams text-to-speech over WebSocket with connection pooling. ### Constructor ```python theme={null} tts = slng.TTS( api_key="your-slng-api-key", # Required. SLNG API key. Falls back to SLNG_API_KEY env var. model="deepgram/aura:2", # Model identifier. Default: "deepgram/aura:2" voice="aura-2-thalia-en", # Voice identifier. Default: "default" language="en", # Language code. Default: "en" sample_rate=24000, # Audio sample rate in Hz. Default: 24000 speed=1.0, # Speech speed multiplier. Default: 1.0 model_endpoint=None, # Optional explicit WebSocket endpoint URL slng_base_url="api.slng.ai", # Gateway host override (self-hosted or staging) region_override=None, # Optional region override, sent as X-Region-Override word_tokenizer=None, # Optional custom tokenize.WordTokenizer http_session=None, # Optional reused aiohttp.ClientSession # **model_options # Arbitrary model-specific kwargs (see below) ) ``` Additional keyword arguments are forwarded to the chosen model's init payload. Known keys by provider: * **Rime Arcana**: `modelId`, `segment`, `speakingStyle`, `addBreathing`, `addDisfluencies`, `phonemizeBetweenBrackets`, `translateTo`. * **Sarvam Bulbul**: `pace`, `temperature`, `output_audio_bitrate`, `min_buffer_size`, `max_chunk_length`, `target_language_code`. ### Streaming vs batch * `tts.stream()` sends text word-by-word and returns audio chunks in real time. Use this for voice agents. * `tts.synthesize(text)` does one-shot synthesis. Works fine for previews, but `stream()` is better for interactive agents. ### Default endpoint If `model_endpoint` is omitted, the plugin connects to: ``` wss://api.slng.ai/v1/tts/{model} ``` ### Region override `TTS` supports the same `region_override` option and forwards it to the platform as `X-Region-Override`. See the full region list and override behavior at [docs.slng.ai/region-override](https://docs.slng.ai/region-override). ### Voice selection Pick a voice that matches your chosen model. See the [Voices](/voices/deepgram-aura) pages for what's available per provider. ## Provider notes ### Sarvam Bulbul v3 TTS Works out of the box. The plugin auto-normalizes language codes to BCP-47 on the wire. Pass `language="hi"`, the plugin sends `"hi-IN"` to Sarvam. To override the normalization (e.g. force a different target language), pass `target_language_code="..."` in `model_options`. ### Sarvam Saaras v3 STT Saaras on SLNG is HTTP-only (no WebSocket endpoint) and is therefore **not supported** by this plugin's realtime streaming path. For Hindi STT in a voice agent, use `slng/deepgram/nova:3-hi` or `slng/deepgram/nova:3-multi` instead. ### Rime Arcana Requires a `voice` (speaker) that matches the chosen language. Passing `voice="default"` auto-resolves to a reasonable default per language. The plugin outputs `linear16` PCM audio internally and registers itself with LiveKit on import. Both `STT` and `TTS` authenticate with `api_key`. Most new SLNG platform models work without plugin updates, but providers with non-standard WebSocket message formats may require plugin support (for example, Sarvam Bulbul needed nested `data.audio` parsing). ## Next steps * Browse available [Models](/models/index) for STT and TTS * Check the [Voices](/voices/deepgram-aura) pages for voice options per provider * See [Voice Agents](/voice-agents) for the SLNG-managed agents API # Pipecat plugin for SLNG Source: https://docs.slng.ai/agents/pipecat-plugin Use the pipecat-slng Python package to connect a Pipecat pipeline to any STT or TTS model on the SLNG gateway by swapping a single model string. `pipecat-slng` adds STT and TTS services for [Pipecat](https://github.com/pipecat-ai/pipecat). It routes your pipeline through the SLNG gateway, so you can use any STT or TTS model on SLNG — Deepgram, ElevenLabs, Rime, Sarvam, and more — behind one API key. Swap the `model` string to switch provider; no other code changes needed. Tested with Pipecat v1.3.0. [BYOK](#bring-your-own-key-byok) requires `pipecat-slng` 0.4.0 or later. ## Prerequisites * Python 3.11+ * `pipecat-ai>=1.3.0` * A [Pipecat](https://github.com/pipecat-ai/pipecat) project * An SLNG API key (get one at [app.slng.ai](https://app.slng.ai/api-keys)) ## Installation ```bash theme={null} uv add pipecat-slng # or pip install pipecat-slng ``` ## Credentials You need an SLNG API key. Read it from the `SLNG_API_KEY` environment variable: ```bash theme={null} export SLNG_API_KEY="your-slng-api-key" ``` Then pass it to each service via `api_key`: ```python theme={null} import os from pipecat_slng import SlngSTTService stt = SlngSTTService( api_key=os.getenv("SLNG_API_KEY"), model="slng/deepgram/nova:3-en", ) ``` ## Quickstart Create an STT and TTS service, then add them to your Pipecat pipeline: ```python theme={null} import os from pipecat_slng import SlngSTTService, SlngTTSService stt = SlngSTTService( api_key=os.getenv("SLNG_API_KEY"), model="slng/deepgram/nova:3-en", ) tts = SlngTTSService( api_key=os.getenv("SLNG_API_KEY"), model="slng/deepgram/aura:2-en", voice="aura-2-thalia-en", ) ``` `SlngSTTService` and `SlngTTSService` stream over WebSocket: low latency, with mid-utterance interruption support. Common runtime knobs are top-level keyword arguments (`language`, `speed`, `enable_vad`, `enable_partials`). For richer overrides, pass a `SlngSTTSettings(...)` or `SlngTTSSettings(...)` to `settings=`. ## Region routing Both services support gateway region routing. Pin requests to a specific datacenter with `region_override`, or constrain them to a broad geographic zone with `world_part_override`. When both are set, `region_override` wins. ```python theme={null} stt = SlngSTTService( api_key=os.getenv("SLNG_API_KEY"), model="slng/deepgram/nova:3-en", region_override="eu-north-1", # ap-southeast-2 | eu-north-1 | us-east-1 world_part_override="eu", # ap | eu | na ) ``` The WebSocket services send these as the `X-Region-Override` and `X-World-Part-Override` headers; the HTTP service (below) sends them as the `region` and `world-part` query parameters. See the full region list and override behavior at [docs.slng.ai/region-override](https://docs.slng.ai/region-override). ## Bring your own key (BYOK) If you already have a contract with an upstream provider, pass your own provider key via `provider_key`. All three services forward it as the `X-Slng-Provider-Key` header — on the WebSocket upgrade for the streaming services, on each request for `SlngHttpTTSService` — so the provider bills your account directly and no SLNG audio-minute fees apply, while the SLNG cache still applies on top. See [Bring your own key](/execution-layer/byok) for caching behavior and the supported provider list. ```python theme={null} import os from pipecat_slng import SlngSTTService, SlngTTSService stt = SlngSTTService( api_key=os.getenv("SLNG_API_KEY"), model="deepgram/nova:3", # external route — no slng/ prefix provider_key=os.getenv("SLNG_PROVIDER_KEY"), ) tts = SlngTTSService( api_key=os.getenv("SLNG_API_KEY"), model="deepgram/aura:2", # external route — no slng/ prefix voice="aura-2-thalia-en", provider_key=os.getenv("SLNG_PROVIDER_KEY"), ) ``` BYOK only works on **external** catalog routes — model strings without the `slng/` prefix, such as `deepgram/nova:3` or `deepgram/aura:2`. SLNG-hosted `slng/...` routes reject the header with an HTTP 400 ("BYOK is only supported for external STT/TTS routes"). If the upstream provider rejects your key, the failure surfaces as a `backend_connection_failed` error frame over WebSocket, or as the upstream 401/403 with the `X-Slng-Auth-Source: client_key` response header over HTTP. Since `pipecat-slng` 0.4.0, WebSocket connect-rejection errors include the server's response body, so a misrouted BYOK request reports the reason rather than a bare `HTTP 400`. ## Full voice agent example A complete cascade pipeline — Speech-to-Text → LLM → Text-to-Speech — using SLNG for STT and TTS and OpenAI for the LLM. The bot introduces itself when a client connects: ```python theme={null} import os from dotenv import load_dotenv from pipecat.audio.vad.silero import SileroVADAnalyzer from pipecat.frames.frames import LLMRunFrame from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineParams, PipelineTask from pipecat.processors.aggregators.llm_context import LLMContext from pipecat.processors.aggregators.llm_response_universal import ( LLMContextAggregatorPair, LLMUserAggregatorParams, ) from pipecat.runner.types import RunnerArguments, SmallWebRTCRunnerArguments from pipecat.services.openai.responses.llm import OpenAIResponsesLLMService from pipecat.transcriptions.language import Language from pipecat.transports.base_transport import BaseTransport, TransportParams from pipecat_slng import SlngSTTService, SlngTTSService load_dotenv(override=True) async def run_bot(transport: BaseTransport): slng_api_key = os.environ["SLNG_API_KEY"] stt = SlngSTTService( api_key=slng_api_key, model="slng/deepgram/nova:3-en", language=Language.EN, enable_vad=True, enable_partials=True, # region_override="eu-north-1", # uncomment to pin to a datacenter ) # Text-to-Speech (streaming WebSocket — low latency, supports interruption). # Deepgram Aura 2 supports `speed`; Rime / Sarvam don't (parameter-coverage # table on docs.slng.ai). Swap model= and voice= to change provider. tts = SlngTTSService( api_key=slng_api_key, model="slng/deepgram/aura:2-en", voice="aura-2-arcas-en", language=Language.EN, speed=1, # region_override="eu-north-1", ) llm = OpenAIResponsesLLMService( api_key=os.getenv("OPENAI_API_KEY"), settings=OpenAIResponsesLLMService.Settings( model=os.getenv("OPENAI_MODEL", "gpt-4.1"), system_instruction=( "You are a helpful assistant in a voice conversation. " "Your responses will be spoken aloud, so avoid emojis, bullet points, " "or other formatting that can't be spoken." ), ), ) context = LLMContext() user_aggregator, assistant_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams(vad_analyzer=SileroVADAnalyzer()), ) pipeline = Pipeline( [ transport.input(), stt, user_aggregator, llm, tts, transport.output(), assistant_aggregator, ] ) task = PipelineTask( pipeline, params=PipelineParams(enable_metrics=True, enable_usage_metrics=True), ) @task.rtvi.event_handler("on_client_ready") async def on_client_ready(rtvi): context.add_message({"role": "user", "content": "Please introduce yourself."}) await task.queue_frames([LLMRunFrame()]) runner = PipelineRunner(handle_sigint=False) await runner.run(task) async def bot(runner_args: RunnerArguments): match runner_args: case SmallWebRTCRunnerArguments(): from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport transport = SmallWebRTCTransport( webrtc_connection=runner_args.webrtc_connection, params=TransportParams(audio_in_enabled=True, audio_out_enabled=True), ) await run_bot(transport) if __name__ == "__main__": from pipecat.runner.run import main main() ``` The full example, including the Daily transport branch, lives in [`examples/bot.py`](https://github.com/slng-ai/pipecat-slng/blob/main/examples/bot.py). Run it with: ```bash theme={null} cp .env.example .env # set SLNG_API_KEY and OPENAI_API_KEY uv run --extra example examples/bot.py ``` Then open `http://localhost:7860/client` and start talking. It uses the SmallWebRTC transport by default; pass `-t daily` to use Daily instead (requires `pipecat-ai[daily]`). Setting `SLNG_PROVIDER_KEY` (your own Deepgram key) in `.env` flips the example into [BYOK](#bring-your-own-key-byok) mode on the external `deepgram/nova:3` / `deepgram/aura:2` routes. ## Model identifiers Models follow the format `provider/model:variant`. Prefix with `slng/` to target an SLNG-hosted instance, and suffix the language where the model exposes per-language variants: ```python theme={null} model="slng/deepgram/nova:3-en" # SLNG-hosted Deepgram Nova 3, English (STT) model="slng/deepgram/aura:2-en" # SLNG-hosted Deepgram Aura 2, English (TTS) ``` Model strings without the `slng/` prefix are **external** routes, proxied to the provider's own API: ```python theme={null} model="deepgram/nova:3" # Proxied to Deepgram Nova 3 (STT) model="deepgram/aura:2" # Proxied to Deepgram Aura 2 (TTS) ``` External routes are required for [BYOK](#bring-your-own-key-byok). The plugin routes through the SLNG Unmute bridge, so the full list of models you can pass to `model=` is the bridge's supported-models list — see [Supported models](/execution-layer/unified-api-models). Not every model accepts every option (for example `speed` on TTS); check the [parameter coverage](/execution-layer/unified-api-parameters) table before tuning. ## STT reference `SlngSTTService` streams speech-to-text over WebSocket, connecting to `wss://api.slng.ai/v1/bridges/unmute/stt/{model}`. ### Constructor ```python theme={null} stt = SlngSTTService( api_key="your-slng-api-key", # Required. SLNG API key. model="slng/deepgram/nova:3-en", # Model identifier. Default: "slng/deepgram/nova:3-en" base_url="api.slng.ai", # Gateway host (self-hosted or staging). Default: "api.slng.ai" encoding="linear16", # "linear16", "mp3", or "opus". Default: "linear16" sample_rate=None, # Audio sample rate in Hz. Default: the pipeline sample rate language=Language.EN, # Recognition language. Default: English enable_vad=True, # Enable server-side VAD. Default: True enable_partials=True, # Stream interim (partial) transcripts. Default: True region_override=None, # Pin to a datacenter, sent as X-Region-Override world_part_override=None, # Constrain to a zone, sent as X-World-Part-Override provider_key=None, # Your own provider key (BYOK), sent as X-Slng-Provider-Key — external routes only settings=None, # Optional SlngSTTSettings for runtime updates ) ``` `Language` is imported from `pipecat.transcriptions.language`. ### Confidence filter When the provider surfaces a confidence score, transcripts below `0.5` are dropped before reaching your pipeline. ### Default endpoint The plugin connects to: ``` wss://api.slng.ai/v1/bridges/unmute/stt/{model} ``` ## TTS reference (streaming) `SlngTTSService` streams text-to-speech over WebSocket, connecting to `wss://api.slng.ai/v1/bridges/unmute/tts/{model}`. This is the recommended path for interactive voice agents. ### Constructor ```python theme={null} tts = SlngTTSService( api_key="your-slng-api-key", # Required. SLNG API key. model="slng/deepgram/aura:2-en", # Model identifier. Default: "slng/deepgram/aura:2-en" voice="aura-2-thalia-en", # Voice identifier. Default: None (server default) base_url="api.slng.ai", # Gateway host. Default: "api.slng.ai" encoding="linear16", # "linear16", "mp3", "opus", "mulaw", or "alaw". Default: "linear16" sample_rate=None, # Audio sample rate in Hz. Default: the pipeline sample rate language=Language.EN, # Synthesis language. Default: English speed=None, # Speech speed multiplier. Default: None (server default) region_override=None, # Pin to a datacenter, sent as X-Region-Override world_part_override=None, # Constrain to a zone, sent as X-World-Part-Override provider_key=None, # Your own provider key (BYOK), sent as X-Slng-Provider-Key — external routes only settings=None, # Optional SlngTTSSettings for runtime updates ) ``` ### Runtime settings updates Changing `voice`, `speed`, or `language` mid-session (via Pipecat settings updates) reconnects the WebSocket to re-run the init handshake. Expect a brief reconnect, not a silent no-op. ### Default endpoint The plugin connects to: ``` wss://api.slng.ai/v1/bridges/unmute/tts/{model} ``` ### Voice selection Pick a voice that matches your chosen model. See the [Voices](/voices/deepgram-aura) pages for what's available per provider. ## HTTP TTS (non-streaming fallback) For simple request/response synthesis where streaming is not required, use `SlngHttpTTSService`. It issues one HTTP `POST` per utterance and returns the full audio body in a single frame. ```python theme={null} import os from pipecat_slng import SlngHttpTTSService tts = SlngHttpTTSService( api_key=os.getenv("SLNG_API_KEY"), model="slng/deepgram/aura:2-en", voice="aura-2-thalia-en", ) ``` The HTTP bridge body accepts **only** `{text, voice}` — there is no `config` object. Encoding, `sample_rate`, `language`, and `speed` are therefore **not configurable over HTTP**; the server returns its default audio format. `language` and `speed` are kept for API parity with the WebSocket service but are not sent over the wire. The service auto-detects WAV (decoded to raw PCM at the file's sample rate) and plain PCM (passed through at the pipeline's sample rate). Compressed responses (MP3/Ogg) yield an `ErrorFrame` — use the streaming `SlngTTSService` if you need codec control. Pass `aiohttp_session=` to reuse a shared `aiohttp.ClientSession`; otherwise one is created internally. Region routing on the HTTP service uses the `region` and `world-part` query parameters instead of headers. [BYOK](#bring-your-own-key-byok) works here too: pass `provider_key` and it is sent as the `X-Slng-Provider-Key` header on each request (external routes only). ## Good to know Both WebSocket services output `linear16` PCM by default and authenticate with `api_key`. The package exports `SlngSTTService`, `SlngTTSService`, and `SlngHttpTTSService`, plus the `SlngSTTSettings` and `SlngTTSSettings` settings classes. Prefer the streaming `SlngTTSService` for conversational agents — it supports mid-utterance interruption. Reserve `SlngHttpTTSService` for batch or non-interactive synthesis. ## Next steps * Browse the [supported models](/execution-layer/unified-api-models) and [parameter coverage](/execution-layer/unified-api-parameters) for the Unmute bridge * Read [Bring your own key](/execution-layer/byok) for BYOK caching behavior and supported providers * Check the [Voices](/voices/deepgram-aura) pages for voice options per provider * See [Voice Agents](/voice-agents) for the SLNG-managed agents API * Using LiveKit instead? See the [LiveKit plugin](/agents/livekit-plugin) # Get current account Source: https://docs.slng.ai/api-reference/account/get-current-account /api-reference/me/me.oas.json get /v1/me Return the account, organization, and API key associated with the bearer token in the request. Use this to confirm which key is in use and which organization and plan tier it belongs to. # Create agent Source: https://docs.slng.ai/api-reference/agents/create-agent /api-reference/agents/agents.oas.yaml post /v1/agents Create a new voice agent. # Delete agent Source: https://docs.slng.ai/api-reference/agents/delete-agent /api-reference/agents/agents.oas.yaml delete /v1/agents/{agent_id} Soft-delete a voice agent. # Duplicate agent Source: https://docs.slng.ai/api-reference/agents/duplicate-agent /api-reference/agents/agents.oas.yaml post /v1/agents/{agent_id}/duplicate Create a server-side copy of an existing voice agent. The duplicate copies the stored agent configuration, including prompts, models, tools, template defaults, selected region, and outbound telephony settings. The duplicate does not copy call history or the inbound connection. SLNG creates and manages a runtime API key for the duplicated agent automatically. # Get agent Source: https://docs.slng.ai/api-reference/agents/get-agent /api-reference/agents/agents.oas.yaml get /v1/agents/{agent_id} Get a single voice agent by ID. # List agents Source: https://docs.slng.ai/api-reference/agents/list-agents /api-reference/agents/agents.oas.yaml get /v1/agents List all voice agents for your organisation. # Replace agent Source: https://docs.slng.ai/api-reference/agents/replace-agent /api-reference/agents/agents.oas.yaml put /v1/agents/{agent_id} Replace a voice agent (full update). # Update agent (partial) Source: https://docs.slng.ai/api-reference/agents/update-agent-partial /api-reference/agents/agents.oas.yaml patch /v1/agents/{agent_id} Partially update a voice agent. # Cognigy STT Source: https://docs.slng.ai/api-reference/bridges/cognigy-stt-bridge/cognigy-stt-bridge-http POST /v1/bridges/cognigy/stt/{model_variant} Transcribe audio via Cognigy Voice Gateway protocol bridge. # Cognigy STT Source: https://docs.slng.ai/api-reference/bridges/cognigy-stt-bridge/cognigy-stt-bridge-ws Stream live audio to SLNG over the Cognigy Voice Gateway WebSocket protocol and receive real-time STT transcripts from any supported model. # Cognigy TTS Source: https://docs.slng.ai/api-reference/bridges/cognigy-tts-bridge/cognigy-tts-bridge-http POST /v1/bridges/cognigy/tts/{model_variant} Synthesize speech via Cognigy Voice Gateway protocol bridge. # Cognigy TTS Source: https://docs.slng.ai/api-reference/bridges/cognigy-tts-bridge/cognigy-tts-bridge-ws Stream synthesized speech from SLNG over the Cognigy Voice Gateway WebSocket protocol using any supported text-to-speech model and voice. # Jambonz STT Source: https://docs.slng.ai/api-reference/bridges/jambonz-stt-bridge/jambonz-stt-bridge-http POST /v1/bridges/jambonz/stt/{model_variant} Transcribe audio via Jambonz custom STT protocol bridge. The model_variant path parameter specifies the target STT model (e.g., deepgram/nova:3, slng/openai/whisper:large-v3). # Jambonz STT Source: https://docs.slng.ai/api-reference/bridges/jambonz-stt-bridge/jambonz-stt-bridge-ws Reference for the Jambonz custom WebSocket STT bridge channel, including init, audio, stop messages and the model_variant routing parameter. # Jambonz TTS Source: https://docs.slng.ai/api-reference/bridges/jambonz-tts-bridge/jambonz-tts-bridge-http POST /v1/bridges/jambonz/tts/{model_variant} Synthesize speech via Jambonz custom TTS protocol bridge. The model_variant path parameter specifies the target TTS model (e.g., deepgram/aura:2). # Jambonz TTS Source: https://docs.slng.ai/api-reference/bridges/jambonz-tts-bridge/jambonz-tts-bridge-ws Reference for the Jambonz custom WebSocket TTS bridge channel, including stream, flush, stop messages, binary audio frames, and model_variant routing. # Dispatch call Source: https://docs.slng.ai/api-reference/calls/dispatch-call /api-reference/agents/agents.oas.yaml post /v1/agents/{agent_id}/calls Dispatch an outbound call for a voice agent. Note: outbound calls require the agent to be configured with an outbound connection. # Get call Source: https://docs.slng.ai/api-reference/calls/get-call /api-reference/agents/agents.oas.yaml get /v1/agents/{agent_id}/calls/{call_id} Get details of a specific call. # List calls Source: https://docs.slng.ai/api-reference/calls/list-calls /api-reference/agents/agents.oas.yaml get /v1/agents/{agent_id}/calls List calls for a voice agent (paginated). # API Reference Source: https://docs.slng.ai/api-reference/overview Every SLNG endpoint, grouped by capability: the Unified API, text-to-speech, speech-to-text, voice agents, batch, and orchestrator bridges. Every SLNG endpoint, grouped by capability. Pick a section to see request and response details, or start with the Unified API to reach any provider through one request shape. One request shape for every STT and TTS model. Swap providers by changing the URL. Includes parameter coverage and supported models. Per-provider TTS endpoints over HTTP and WebSocket, plus pronunciation dictionaries. Per-provider STT endpoints for file transcription and real-time streaming. Create, configure, and dispatch SLNG voice agents, including outbound calls. Submit and manage large transcription and synthesis jobs asynchronously. Connect Cognigy and Jambonz voice platforms to SLNG STT and TTS. # Create web session Source: https://docs.slng.ai/api-reference/sessions/create-web-session /api-reference/agents/agents.oas.yaml post /v1/agents/{agent_id}/web-sessions Create a browser session for a voice agent. This is commonly used for in-browser testing without PSTN. # Create batch job Source: https://docs.slng.ai/api-reference/speechmatics/create-batch-job /api-reference/batch/batch.oas.json post /v1/batch/jobs Submit audio for asynchronous transcription. Supports file upload (`multipart/form-data`), URL input, and presigned S3 upload (`application/json`). For the presigned-upload flow, the first call (`mode: "presign"`) returns **200 OK** with an upload URL — the job is not created until step 3. All other submission paths return **202 Accepted** with `status: QUEUED`. # Delete batch job Source: https://docs.slng.ai/api-reference/speechmatics/delete-batch-job /api-reference/batch/batch.oas.json delete /v1/batch/jobs/{jobId} Delete a completed or failed job. Only jobs in a terminal status (`DONE` or `FAILED`) can be deleted. # Get batch job Source: https://docs.slng.ai/api-reference/speechmatics/get-batch-job /api-reference/batch/batch.oas.json get /v1/batch/jobs/{jobId} Returns the full details of a job (status, config, timestamps, error info). Poll until `status` reaches `DONE` or `FAILED`. # Get batch job files Source: https://docs.slng.ai/api-reference/speechmatics/get-batch-job-files /api-reference/batch/batch.oas.json get /v1/batch/jobs/{jobId}/files Returns signed download URLs for the input audio and the output transcripts of a completed job. Outputs are returned per available format (`json`, `txt`, `srt`); missing formats are omitted. # List batch jobs Source: https://docs.slng.ai/api-reference/speechmatics/list-batch-jobs /api-reference/batch/batch.oas.json get /v1/batch/jobs Returns a paginated list of jobs for your organization. Supports filtering by status, model, and submission date range, plus sorting. # Nova 2 Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-2/nova-2-http POST /v1/stt/deepgram/nova:2 Transcribe audio using Deepgram Nova 2 with VAD and speaker diarization. # Nova 2 Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-2/nova-2-ws Stream real-time speech-to-text transcripts from Deepgram Nova 2 over WebSocket with voice activity detection, speaker diarization, and partial results. # Nova 3 Medical Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3-medical/nova-3-medical-http POST /v1/stt/deepgram/nova:3-medical Transcribe medical audio using Deepgram Nova 3 Medical with specialized vocabulary. # Nova 3 Medical Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3-medical/nova-3-medical-ws Stream real-time medical transcription from Deepgram Nova 3 Medical over WebSocket with healthcare-specific vocabulary, VAD, and speaker diarization. # Nova 3 (English) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-english-http POST /v1/stt/slng/deepgram/nova:3-en Transcribe English audio using SLNG-hosted Deepgram Nova 3. # Nova 3 (English) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-english-ws Stream real-time English transcripts from SLNG-hosted Deepgram Nova 3 over WebSocket with low-latency partials, finals, and speaker diarization. # Nova 3 (Hindi) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-hindi-http POST /v1/stt/slng/deepgram/nova:3-hi Transcribe Hindi audio using SLNG-hosted Deepgram Nova 3. # Nova 3 (Hindi) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-hindi-ws Stream real-time Hindi speech-to-text from SLNG-hosted Deepgram Nova 3 over WebSocket with low-latency partial and final transcripts and VAD. # Nova 3 Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-http POST /v1/stt/deepgram/nova:3 Transcribe audio using Deepgram Nova 3 with VAD and speaker diarization. # Nova 3 (Indonesian) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-indonesian-ws Stream real-time Indonesian speech-to-text from SLNG-hosted Deepgram Nova 3 over WebSocket with low-latency partial and final transcripts and VAD. # Nova 3 (Kannada) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-kannada-ws Stream real-time Kannada speech-to-text from SLNG-hosted Deepgram Nova 3 over WebSocket with low-latency partial and final transcripts and VAD. # Nova 3 (Marathi) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-marathi-ws Stream real-time Marathi speech-to-text from SLNG-hosted Deepgram Nova 3 over WebSocket with low-latency partial and final transcripts and VAD. # Nova 3 (Multi-Language) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-multi-language-http POST /v1/stt/slng/deepgram/nova:3-multi Transcribe multi-language audio using SLNG-hosted Deepgram Nova 3. # Nova 3 (Multi-Language) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-multi-language-ws Stream real-time multilingual transcripts from SLNG-hosted Deepgram Nova 3 over WebSocket with automatic language detection across supported languages. # Nova 3 (Spanish) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-spanish-http POST /v1/stt/slng/deepgram/nova:3-es Transcribe Spanish audio using SLNG-hosted Deepgram Nova 3. # Nova 3 (Spanish) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-spanish-ws Stream real-time Spanish speech-to-text from SLNG-hosted Deepgram Nova 3 over WebSocket with low-latency partial and final transcripts and VAD. # Nova 3 (Telugu) Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-telugu-ws Stream real-time Telugu speech-to-text from SLNG-hosted Deepgram Nova 3 over WebSocket with low-latency partial and final transcripts and VAD. # Nova 3 Source: https://docs.slng.ai/api-reference/stt/deepgram-nova-3/nova-3-ws Stream real-time speech-to-text transcripts from Deepgram Nova 3 over WebSocket with voice activity detection, speaker diarization, and partial results. # Gradium STT default Source: https://docs.slng.ai/api-reference/stt/gradium-stt/gradium-stt-default-ws Real-time multilingual speech-to-text via streaming WebSocket # Reson8 STT v1 Source: https://docs.slng.ai/api-reference/stt/reson8-stt-v1/reson8-stt-v1-ws Real-time speech-to-text transcription using Reson8 via WebSocket. Supports streaming audio with word-level timestamps, confidence scores, and partial results. # Saaras v3 Source: https://docs.slng.ai/api-reference/stt/sarvam-ai-saaras/saaras-v3-http POST /v1/stt/sarvam/saaras:v3 Transcribe audio using Sarvam AI Saaras with domain-aware speech recognition for 23 languages and flexible output modes. # Saaras v3 Source: https://docs.slng.ai/api-reference/stt/sarvam-ai-saaras/saaras-v3-ws "Stream real-time speech-to-text transcripts from Sarvam AI Saaras v3 over WebSocket with voice activity detection across 23 Indian languages. Session configuration is provided via query parameters on the WebSocket upgrade URL: `language-code`, `mode`, `sample_rate`, `input_audio_codec`, `high_vad_sensitivity`, `vad_signals`." # Speech AI Real-time v4 Source: https://docs.slng.ai/api-reference/stt/soniox-speech-ai-real-time-v4/speech-ai-real-time-v4-ws Stream real-time transcripts from Soniox Speech AI v4 over WebSocket with speaker diarization, language detection, and configurable endpoint detection. # Cartesia Sonic 3 Source: https://docs.slng.ai/api-reference/tts/cartesia-sonic-3/cartesia-sonic-3-ws Stream low-latency speech synthesis from Cartesia Sonic 3 over WebSocket with a multilingual catalog of voices and context-aware controls. # Aura 2 (English) Source: https://docs.slng.ai/api-reference/tts/deepgram-aura-2/aura-2-english-http POST /v1/tts/slng/deepgram/aura:2-en Synthesize English speech using SLNG-hosted Deepgram Aura 2. # Aura 2 (English) Source: https://docs.slng.ai/api-reference/tts/deepgram-aura-2/aura-2-english-ws Stream low-latency conversational English text-to-speech from SLNG-hosted Deepgram Aura 2 over WebSocket, optimized for production voice agents. # Aura 2 Source: https://docs.slng.ai/api-reference/tts/deepgram-aura-2/aura-2-http POST /v1/tts/deepgram/aura:2 Synthesize speech using Deepgram Aura 2 for conversational voice agents. # Aura 2 (Spanish) Source: https://docs.slng.ai/api-reference/tts/deepgram-aura-2/aura-2-spanish-http POST /v1/tts/slng/deepgram/aura:2-es Synthesize Spanish speech using SLNG-hosted Deepgram Aura 2. # Aura 2 (Spanish) Source: https://docs.slng.ai/api-reference/tts/deepgram-aura-2/aura-2-spanish-ws Stream low-latency conversational Spanish text-to-speech from SLNG-hosted Deepgram Aura 2 over WebSocket, optimized for production voice agents. # Aura 2 Source: https://docs.slng.ai/api-reference/tts/deepgram-aura-2/aura-2-ws Stream low-latency conversational text-to-speech from Deepgram Aura 2 over WebSocket as raw binary frames, optimized for ultra-low-latency voice agents. # Gradium TTS default Source: https://docs.slng.ai/api-reference/tts/gradium-tts/gradium-tts-default-http POST /v1/tts/gradium/tts:default Real-time multilingual text-to-speech with streaming WebSocket and one-shot HTTP synthesis # Gradium TTS default Source: https://docs.slng.ai/api-reference/tts/gradium-tts/gradium-tts-default-ws Real-time multilingual text-to-speech with streaming WebSocket and one-shot HTTP synthesis # Inworld Max 1.5 Source: https://docs.slng.ai/api-reference/tts/inworld-max-1-5/inworld-max-1-5-http POST /v1/tts/slng/inworld/max:1.5 Synthesize speech using SLNG-hosted Inworld Max 1.5. # Inworld Max 1.5 Source: https://docs.slng.ai/api-reference/tts/inworld-max-1-5/inworld-max-1-5-ws Stream multilingual speech synthesis from SLNG-hosted Inworld Max 1.5 over WebSocket using SLNG's unified low-latency TTS protocol. # Kugel 1 Turbo Source: https://docs.slng.ai/api-reference/tts/kugel-1-turbo/kugel-1-turbo-ws Stream low-latency high-quality speech from KugelAudio Kugel 1 Turbo over WebSocket with expressiveness controls and SLNG's unified TTS protocol. # Kugel 1 Source: https://docs.slng.ai/api-reference/tts/kugel-1/kugel-1-ws Stream high-quality speech synthesis from KugelAudio Kugel 1 over WebSocket with expressiveness controls and SLNG's unified TTS protocol. # Kugel 2 Turbo Source: https://docs.slng.ai/api-reference/tts/kugel-2-turbo/kugel-2-turbo-ws Stream low-latency high-quality speech synthesis from KugelAudio Kugel 2 Turbo over WebSocket with expressiveness controls and SLNG's unified TTS protocol. # Kugel 2 Source: https://docs.slng.ai/api-reference/tts/kugel-2/kugel-2-ws Stream high-quality speech synthesis from KugelAudio Kugel 2 over WebSocket with expressiveness controls and SLNG's unified TTS protocol. # Murf Falcon Source: https://docs.slng.ai/api-reference/tts/murf-falcon/murf-falcon-ws Stream high-quality multilingual speech synthesis from Murf Falcon over WebSocket with selectable encodings, sample rates, and SLNG's unified TTS protocol. # Create pronunciation dictionary Source: https://docs.slng.ai/api-reference/tts/pronunciation-dictionaries/create-pronunciation-dictionary-http POST /v1/pronunciation/dictionaries Create a reusable pronunciation dictionary for TTS rewrite rules. For workflow examples, see the [Pronunciation dictionaries guide](/pronunciation-dictionaries). # Delete pronunciation dictionary Source: https://docs.slng.ai/api-reference/tts/pronunciation-dictionaries/delete-pronunciation-dictionary-http DELETE /v1/pronunciation/dictionaries/{name} Delete one pronunciation dictionary by name from the authenticated organization. For workflow examples, see the [Pronunciation dictionaries guide](/pronunciation-dictionaries). # Get pronunciation dictionary Source: https://docs.slng.ai/api-reference/tts/pronunciation-dictionaries/get-pronunciation-dictionary-http GET /v1/pronunciation/dictionaries/{name} Read one pronunciation dictionary by name from the authenticated organization. For workflow examples, see the [Pronunciation dictionaries guide](/pronunciation-dictionaries). # List pronunciation dictionaries Source: https://docs.slng.ai/api-reference/tts/pronunciation-dictionaries/list-pronunciation-dictionaries-http GET /v1/pronunciation/dictionaries List pronunciation dictionaries for the authenticated organization. For workflow examples, see the [Pronunciation dictionaries guide](/pronunciation-dictionaries). # Arcana v3 (English) Source: https://docs.slng.ai/api-reference/tts/rime-arcana-v3/arcana-v3-english-http POST /v1/tts/slng/rime/arcana:3-en Synthesize English speech using Rime Arcana v3 TTS model. # Arcana v3 (English) Source: https://docs.slng.ai/api-reference/tts/rime-arcana-v3/arcana-v3-english-ws Text-to-Speech API for generating English speech using Rime Arcana v3 TTS model. Establishes a WebSocket connection for real-time text-to-speech. # Arcana v3 (French) Source: https://docs.slng.ai/api-reference/tts/rime-arcana-v3/arcana-v3-french-http POST /v1/tts/slng/rime/arcana:3-fr Synthesize French speech using Rime Arcana v3 TTS model. # Arcana v3 (French) Source: https://docs.slng.ai/api-reference/tts/rime-arcana-v3/arcana-v3-french-ws Text-to-Speech API for generating French speech using Rime Arcana v3 TTS model. Establishes a WebSocket connection for real-time text-to-speech. # Arcana v3 (Hindi) Source: https://docs.slng.ai/api-reference/tts/rime-arcana-v3/arcana-v3-hindi-http POST /v1/tts/slng/rime/arcana:3-hi Synthesize Hindi speech using Rime Arcana v3 TTS model. # Arcana v3 (Hindi) Source: https://docs.slng.ai/api-reference/tts/rime-arcana-v3/arcana-v3-hindi-ws Text-to-Speech API for generating Hindi speech using Rime Arcana v3 TTS model. Establishes a WebSocket connection for real-time text-to-speech. # Arcana v3 (Spanish) Source: https://docs.slng.ai/api-reference/tts/rime-arcana-v3/arcana-v3-spanish-http POST /v1/tts/slng/rime/arcana:3-es Synthesize Spanish speech using Rime Arcana v3 TTS model. # Arcana v3 (Spanish) Source: https://docs.slng.ai/api-reference/tts/rime-arcana-v3/arcana-v3-spanish-ws Text-to-Speech API for generating Spanish speech using Rime Arcana v3 TTS model. Establishes a WebSocket connection for real-time text-to-speech. # Bulbul Stream v3 Source: https://docs.slng.ai/api-reference/tts/sarvam-ai-bulbul-stream-v3/bulbul-stream-v3-http POST /v1/tts/sarvam/bulbul-stream:v3 HTTP-streaming multilingual TTS for Indian languages with 30+ speaker voices. Returns raw audio bytes (chunked) in the codec selected via `output_audio_codec`. Unlike `sarvam/bulbul:v3`, no `X-Duration` header is sent and no JSON envelope is used. # Bulbul v3 Source: https://docs.slng.ai/api-reference/tts/sarvam-ai-bulbul-v3/bulbul-v3-http POST /v1/tts/sarvam/bulbul:v3 Synthesize speech using Sarvam AI Bulbul with high-quality multilingual TTS for Indian languages and 30+ speaker voices. # Bulbul v3 Source: https://docs.slng.ai/api-reference/tts/sarvam-ai-bulbul-v3/bulbul-v3-ws Stream multilingual Indian-language speech from Sarvam AI Bulbul v3 over WebSocket with 30+ speaker voices and SLNG's unified low-latency TTS protocol. # Soniox TTS v1 Source: https://docs.slng.ai/api-reference/tts/soniox-tts-v1/soniox-tts-v1-http POST /v1/tts/soniox/tts-rt:v1 Real-time text-to-speech with streaming WebSocket and one-shot HTTP synthesis # Soniox TTS v1 Source: https://docs.slng.ai/api-reference/tts/soniox-tts-v1/soniox-tts-v1-ws Stream real-time speech synthesis from Soniox TTS v1 over WebSocket with low-latency incremental audio output and SLNG's unified TTS protocol. # Unified STT Source: https://docs.slng.ai/api-reference/unified-api/unmute-stt-bridge/unmute-stt-bridge-http POST /v1/bridges/unmute/stt/{model_variant} Transcribe audio via SLNG's native WebSocket protocol bridge. The model_variant path parameter specifies the target STT model (e.g., deepgram/nova:3, slng/openai/whisper:large-v3). # Unified STT Source: https://docs.slng.ai/api-reference/unified-api/unmute-stt-bridge/unmute-stt-bridge-ws Stream audio to any SLNG-supported STT model over the unified WebSocket protocol with init, audio, finalize, partial, and final transcript messages. # Unified TTS Source: https://docs.slng.ai/api-reference/unified-api/unmute-tts-bridge/unmute-tts-bridge-http POST /v1/bridges/unmute/tts/{model_variant} Synthesize speech via SLNG's native WebSocket protocol bridge. The model_variant path parameter specifies the target TTS model (e.g., deepgram/aura:2). # Unified TTS Source: https://docs.slng.ai/api-reference/unified-api/unmute-tts-bridge/unmute-tts-bridge-ws Stream synthesized audio from any SLNG-supported TTS model over the unified WebSocket protocol with init, text, flush, and binary audio frames. # Authentication & API Keys Source: https://docs.slng.ai/authentication Create and manage SLNG keys in the Dashboard, then authenticate requests over HTTP and WebSocket, including key rotation and bringing your own provider key. All requests authenticate with an SLNG key. Create and manage keys in the [Dashboard](https://app.slng.ai/api-keys). | Placeholder | Replace with | | -------------- | --------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys) | ## Create and delete keys Keys are created and deleted from the [API Keys](https://app.slng.ai/api-keys) page in the Dashboard. The secret value is shown once, at creation time. Copy it into your secret store immediately, because you cannot retrieve it later. SLNG keys are secrets. Store them server-side, never embed them in client code, and delete any key that may have been exposed. ## One key, every host You use the same SLNG key across SLNG hosts: | API | Host | Use for | | --------------- | ---------------------------- | ------------------------------------------------------- | | Execution layer | `https://api.slng.ai` | Across all models in the platform | | Models | `https://api.slng.ai` | Real-time TTS, STT, bridges, pronunciation dictionaries | | Managed Agents | `https://api.agents.slng.ai` | Create and manage voice agents under `/v1/agents` | | Batch API | `https://api.batch.slng.ai` | Asynchronous batch transcription endpoints | ## Authenticate over HTTP Pass the key as a bearer token in the `Authorization` header: ```bash highlight={2} theme={null} curl https://api.slng.ai/v1/tts/slng/deepgram/aura:2 \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{"text": "Hello from SLNG!"}' \ --output hello.wav ``` ## Authenticate over WebSocket Pass the key on the WebSocket upgrade request, either as an `Authorization` header or as a `token` query parameter when the client cannot set headers: ```text theme={null} wss://api.slng.ai/v1/tts/slng/deepgram/aura:2?token=SLNG_API_KEY ``` The browser WebSocket API does not support custom headers. Use the `token` query parameter from the browser, or set the `Authorization` header from a server-side client. ## Rotate a key Keys cannot be rotated in place. To replace a key without downtime: Generate a new key in the Dashboard and store its secret value. Update each caller to use the new key: backend services and any local environments. Verify traffic on the new key before continuing. Once no caller is using the old key, delete it from the Dashboard. Any request still presenting the deleted key starts failing with `401 Unauthorized`. Deleting a key invalidates it for all future requests. If a key may have been exposed, create a replacement first, then delete the compromised one. ## Bring your own provider key To run synthesis or transcription against your own upstream provider account, add the `X-Slng-Provider-Key` header alongside your SLNG key. Read the full guide to [Bring Your Own Key setup](/execution-layer/byok). ## Errors A missing or invalid key returns `HTTP 401`, or an `auth_error` frame over WebSocket. See [Error Codes & Troubleshooting](/reference/errors). # How to use Batch API Source: https://docs.slng.ai/batch-guide Submit audio for asynchronous transcription on the SLNG Batch API using direct file upload, URL input, or presigned S3 upload. Supported formats and limits. This guide covers the three ways to submit audio for batch transcription. For endpoint details and schemas, see the [Batch API reference](/api-reference/speechmatics/create-batch-job). The Batch API is served from `https://api.batch.slng.ai`, not `https://api.slng.ai`, and uses the same bearer SLNG key. **Prerequisites:** * An [SLNG key](/getting-started) * Audio in a supported format: `wav`, `mp3`, `flac`, `aac`, `ogg`, `m4a`, `mp4`, `amr`, `mpeg` ## Placeholders The snippets below use these placeholders. Replace them before running the code. | Placeholder | Replace with | | ------------------------- | -------------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys) | | `myjob123` / `job8f3a...` | The `job_id` returned by `POST /v1/batch/jobs` (auto-generated if omitted) | | `` | The presigned upload URL returned by the `mode: "presign"` flow | | `audio.wav` / `s3_key` | Your local file path / the S3 key the platform returned at presign time | *** ## Input Methods ### Method 1: File Upload Upload the audio file directly using `multipart/form-data`. ```mermaid theme={null} sequenceDiagram participant Client participant Batch API Client->>Batch API: POST /v1/batch/jobs
multipart: file + config Note over Batch API: Queues transcription job Batch API-->>Client: 202 – job_id, status: QUEUED ``` ```bash theme={null} curl -X POST https://api.batch.slng.ai/v1/batch/jobs \ -H "Authorization: Bearer SLNG_API_KEY" \ -F file=@audio.wav \ -F transcription_config='{"language":"en"}' ``` ### Method 2: URL Input Provide a publicly accessible HTTPS URL. The system downloads the file (max 1 GB). Supports presigned S3/GCS URLs for private files. ```mermaid theme={null} sequenceDiagram participant Client participant Batch API Client->>Batch API: POST /v1/batch/jobs
JSON: input_url + config Note over Batch API: Downloads audio from URL Note over Batch API: Queues transcription job Batch API-->>Client: 202 – job_id, status: QUEUED ``` ```bash theme={null} curl -X POST https://api.batch.slng.ai/v1/batch/jobs \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input_url": "https://example.com/audio.wav", "transcription_config": { "language": "en" } }' ``` | Field | Type | Required | Description | | ---------------------- | ------ | -------- | -------------------------------------------------------- | | `input_url` | string | yes | HTTPS URL of the audio file | | `transcription_config` | object | yes | Transcription settings | | `job_id` | string | no | Alphanumeric job ID (auto-generated if omitted) | | `model_code` | string | no | Model to use (default: `slng/speechmatics/batch:15.0.0`) | | `metadata` | object | no | Arbitrary key-value metadata | ### Method 3: Presigned Upload Use this method for large files or when you need upload progress tracking. Three steps: request a presigned URL, upload the file directly to S3, then create the job referencing the upload. ```mermaid theme={null} sequenceDiagram participant Client participant Batch API participant S3 Client->>Batch API: POST /v1/batch/jobs
JSON: mode=presign + filename Batch API-->>Client: 200 – job_id + presigned upload URL Client->>S3: PUT presigned URL
binary audio file S3-->>Client: 200 OK Client->>Batch API: POST /v1/batch/jobs
JSON: job_id + s3_key + config Note over Batch API: Queues transcription job Batch API-->>Client: 202 – job_id, status: QUEUED ``` ```bash theme={null} curl -X POST https://api.batch.slng.ai/v1/batch/jobs \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "mode": "presign", "filename": "audio.wav", "transcription_config": { "language": "en" } }' ``` Returns **200 OK** (no job is created yet). The response carries the `job_id`, the validated `transcription_config` (and any other config you sent, echoed back), the future S3 locations, and an `upload` object with the presigned `PUT` URL plus the headers your upload must send: ```json theme={null} { "job_id": "job8f3a...", "transcription_config": { "language": "en", "operating_point": "standard", "diarization": "none" }, "input_s3_uri": "s3://slng-production-batch-input/inputs/job8f3a.../audio.wav", "output_s3_prefix": "s3://slng-production-batch-output/outputs/job8f3a.../", "upload": { "url": "https://slng-production-batch-input.s3.amazonaws.com/...", "s3_key": "inputs/job8f3a.../audio.wav", "s3_uri": "s3://slng-production-batch-input/inputs/job8f3a.../audio.wav", "expires_in": 1800, "headers": { "Content-Type": "application/octet-stream" } } } ``` Carry the returned `transcription_config` (and any echoed `tracking` / `output_config`) verbatim into step 3. Send the bytes with the exact headers from `upload.headers` (typically `Content-Type: application/octet-stream`): ```bash theme={null} curl -X PUT "" \ -H "Content-Type: application/octet-stream" \ --data-binary @audio.wav ``` Reference the `job_id` and `s3_key` from step 1. ```bash theme={null} curl -X POST https://api.batch.slng.ai/v1/batch/jobs \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "job_id": "job8f3a...", "s3_key": "inputs/job8f3a.../audio.wav", "transcription_config": { "language": "en" } }' ``` *** ## Job Lifecycle After submission, a job moves through these statuses: `QUEUED` → `IN_PROGRESS` → `DECODING` → `POST_PROCESSING` → `DONE` / `FAILED` A successfully deleted job moves to `DELETED` (terminal). Poll `GET /v1/batch/jobs/{jobId}` until the status reaches `DONE` or `FAILED`. The response includes the full job record (config, timestamps, `error_message` on failures); see the [API reference](/api-reference/speechmatics/get-batch-job) for the schema. ```mermaid theme={null} sequenceDiagram participant Client participant Batch API Client->>Batch API: GET /v1/batch/jobs/{jobId} Batch API-->>Client: status: IN_PROGRESS Note over Client: Wait and retry Client->>Batch API: GET /v1/batch/jobs/{jobId} Batch API-->>Client: status: DONE Client->>Batch API: GET /v1/batch/jobs/{jobId}/files Batch API-->>Client: Signed download URLs
(JSON, TXT, SRT transcripts) ``` ```bash theme={null} curl https://api.batch.slng.ai/v1/batch/jobs/myjob123 \ -H "Authorization: Bearer SLNG_API_KEY" ``` Once `DONE`, retrieve the transcript: ```bash theme={null} curl https://api.batch.slng.ai/v1/batch/jobs/myjob123/files \ -H "Authorization: Bearer SLNG_API_KEY" ``` This returns signed download URLs for the input audio and the output transcripts. Each output entry carries a `format` field: `json`, `txt`, or `srt`. Signed URLs expire after roughly one hour. To list jobs instead of polling a single one, use `GET /v1/batch/jobs`. It supports pagination (`page`, `page_size`) plus filtering by `status`, `model_code`, and submission date range, and sorting by `submitted_at`, `status`, or `model_code`. See the [API reference](/api-reference/speechmatics/list-batch-jobs) for the full parameter list. *** ## Transcription Config | Field | Type | Default | Description | | ------------------ | ------- | ---------- | ---------------------------------------------------- | | `language` | string | `en` | Language code for the audio | | `operating_point` | string | `standard` | Accuracy level: `standard` or `enhanced` | | `diarization` | string | `none` | Speaker separation: `none`, `speaker`, or `channel` | | `domain` | string | — | Domain-specific language model | | `output_locale` | string | — | Locale for formatting (dates, numbers) | | `enable_entities` | boolean | — | Detect and format entities (dates, times, addresses) | | `additional_vocab` | array | — | Custom words to improve recognition | Example with diarization and enhanced accuracy: ```json theme={null} { "transcription_config": { "language": "en", "operating_point": "enhanced", "diarization": "speaker" } } ``` *** ## Limits | Limit | Value | | --------------------------- | --------------------------------------------- | | Max file size | 1 GB | | Presigned upload URL expiry | 30 minutes | | Supported audio formats | wav, mp3, flac, aac, ogg, m4a, mp4, amr, mpeg | | URL input scheme | HTTPS only | *** ## Next Steps Endpoint details, request schemas, and response formats Set up your SLNG key and make your first request # SLNG changelog Source: https://docs.slng.ai/changelog Release notes for the SLNG speech and language API. New TTS and STT models, voice agent features, breaking changes, and bug fixes. ## Gradium STT and TTS [Gradium STT](/api-reference/stt/gradium-stt/gradium-stt-default-ws) streams real-time multilingual transcription over WebSocket, with `Flush` and `EndOfStream` controls for finalizing buffered audio. [Gradium TTS](/api-reference/tts/gradium-tts/gradium-tts-default-ws) ships alongside it for streaming WebSocket synthesis and one-shot [HTTP synthesis](/api-reference/tts/gradium-tts/gradium-tts-default-http). ## Kugel 2 Turbo TTS [Kugel 2 Turbo](/api-reference/tts/kugel-2-turbo/kugel-2-turbo-ws) is the new low-latency tier in the Kugel TTS family. Stream high-quality synthesis with expressiveness controls over the unified TTS WebSocket protocol, with word timestamps and per-chunk completion events. ## Rime Arcana v3 French [Rime Arcana v3 French](/api-reference/tts/rime-arcana-v3/arcana-v3-french-ws) joins the Arcana v3 lineup. Synthesize French speech via WebSocket or [HTTP](/api-reference/tts/rime-arcana-v3/arcana-v3-french-http) using speakers like `livet_aurelie`. ## ElevenLabs WebSocket controls The ElevenLabs TTS WebSocket endpoints, including [Flash v2.5](/api-reference/tts/eleven-flash-v2-5/elevenlabs-flash-v2-5-ws), [v3](/api-reference/tts/eleven-v3/elevenlabs-v3-ws), and [Multilingual v2](/api-reference/tts/multilingual-v2/elevenlabs-multilingual-v2-ws), accept new init parameters: `auto_mode` to cut latency by skipping chunk scheduling, `sync_alignment` to attach alignment data to each audio chunk, `enable_ssml_parsing` for inline SSML, `inactivity_timeout` (1–180 s), and `chunk_length_schedule`. Set `enable_logging: false` for zero-retention mode on enterprise plans on both WebSocket and HTTP endpoints. ## Execution Layer documentation The docs portal is reframed around the [Execution Layer](/execution-layer), the three-stage path that sits between your orchestrator and the models. Read how [STT Routing](/execution-layer/stt-routing), [Tiered Decisioning](/execution-layer/tiered-decisioning), and [Output Assembly](/execution-layer/output-assembly) cut latency and model cost across a voice call. The new [How It Works](/execution-layer/how-it-works) page walks through a single turn end to end. ## Streamlined navigation and a single API Reference The top bar collapses to **Documentation**, **Execution Layer**, **API Reference**, and **Models**. Every endpoint now lives under one [API Reference](/api-reference/overview), grouped by Unified API, TTS, STT, Voice Agents, Batch, and Bridges, so you can find a model without switching tabs. ## Consolidated authentication and BYOK pages API key creation, rotation, and request authentication are merged into a single [Authentication & API Keys](/authentication) page. Bring-your-own-key setup for every supported provider now lives on one [BYOK](/execution-layer/byok) page covering caching behavior and provider key handling. ## Pipecat plugin The [`pipecat-slng`](/agents/pipecat-plugin) Python package connects a [Pipecat](https://github.com/pipecat-ai/pipecat) pipeline to any STT or TTS model on the SLNG gateway. It ships `SlngSTTService` and `SlngTTSService` for low-latency streaming over WebSocket, plus `SlngHttpTTSService` for non-streaming synthesis. Swap the `model` string to switch provider or region without changing your pipeline code. ## New Sarvam Bulbul and Soniox voices [Sarvam Bulbul v3](/api-reference/tts/sarvam-ai-bulbul-v3/bulbul-v3-ws) adds the Tamil voice `ta-IN-rahul` and the multilingual `shubh`, while `gu-IN-rahul` is retired. [Soniox TTS v1](/api-reference/tts/soniox-tts-v1/soniox-tts-v1-ws) adds four English voices: `Maya`, `Daniel`, `Noah`, and `Nina`. ## Updated Cartesia Sonic 3 voice metadata The 745-voice [Cartesia Sonic 3](/api-reference/tts/cartesia-sonic-3/cartesia-sonic-3-ws) catalog has refreshed metadata. Voice IDs are unchanged, so existing requests keep working. ## Gateway spec refinements Patch-level updates to the unified gateway specification clean up schemas across the TTS, STT, and bridge surfaces. Existing requests and responses keep working without changes. ## Inworld Max 1.5 TTS [Inworld Max 1.5](/api-reference/tts/inworld-max-1-5/inworld-max-1-5-ws) is available on the SLNG-hosted TTS API in `us-east-1`. Synthesize up to 2,000 characters per request across 16 languages, pick from 100+ named voices, and tune output with `temperature`, `language` (BCP-47), and word- or character-level timestamps. ## Asia South region for Nova 3 English and Rime Arcana v3 [Deepgram Nova 3 English](/api-reference/stt/deepgram-nova-3/nova-3-english-ws) and [Rime Arcana v3 English](/api-reference/tts/rime-arcana-v3/arcana-v3-english-ws) can now route to `asia-south1`. Set `X-Region-Override: asia-south1` to keep traffic in-region for users in India. ## Region rename for Nova 3 Indian languages The Hindi, Marathi, Kannada, Tamil, and Telugu variants of [Deepgram Nova 3](/api-reference/stt/deepgram-nova-3/nova-3-ws) now expose a single `asia-south1` region. The legacy `ap-south-1` value is no longer accepted on `X-Region-Override`; switch any pinned requests to `asia-south1`. ## Supported LLMs for Voice Agents The `llm` field on [Voice Agents](/voice-agents) is now a closed set: `bedrock-mantle/nvidia.nemotron-super-3-120b`, `bedrock-mantle/nvidia.nemotron-nano-3-30b`, and `groq/openai/gpt-oss-120b`. Pick one of these IDs when you create or update an agent so requests route to a live model. ## Moonshot Kimi K2 removed from Voice Agents `groq/moonshotai/kimi-k2-instruct-0905` is no longer routable for Voice Agents. Switch existing agents and example payloads to `groq/openai/gpt-oss-120b` (or another model from the supported list) to keep calls working. ## Segment correlation for Cartesia Sonic 3 [Cartesia Sonic 3](/api-reference/tts/cartesia-sonic-3/cartesia-sonic-3-ws) audio chunks now carry a `flush_id`. The counter starts at `0` and increments each time you send a `text` input with `flush: true` or a standalone flush message. Use it to map streamed audio frames back to the text segment that produced them. ## Pronunciation hints on Cognigy and Jambonz TTS bridges The [Cognigy TTS bridge](/api-reference/bridges/cognigy-tts-bridge/cognigy-tts-bridge-ws) and [Jambonz TTS bridge](/api-reference/bridges/jambonz-tts-bridge/jambonz-tts-bridge-ws) accept an optional `pronunciation` object on `text` messages. Set `mode` (`ipa` or `ssml`), `name`, and `dictionary_id` to override how a term is spoken per chunk on providers that support [pronunciation dictionaries](/pronunciation-dictionaries). ## Binary audio frames on the Unmute TTS bridge The [Unmute TTS bridge](/api-reference/unified-api/unmute-tts-bridge/unmute-tts-bridge-ws) now sends raw binary WebSocket frames instead of JSON `audio_chunk`, `segment_start`, and `segment_end` envelopes. The encoding and sample rate match the values you pass in the init message, so you can write the bytes straight to your audio pipeline. The `clear` control message has been removed; error and `audio_end` payloads now carry their fields under a `data` envelope (with `message` and `code` duplicated at the top level for backward compatibility). ## BYOK support for Kugel TTS You can now bring your own Kugel API key when calling [Kugel 1 Turbo](/api-reference/tts/kugel-1-turbo/kugel-1-turbo-ws), [Kugel 1](/api-reference/tts/kugel-1/kugel-1-ws), and [Kugel 2](/api-reference/tts/kugel-2/kugel-2-ws). Pass it on the `X-Slng-Provider-Key` header so Kugel bills your account directly, while the SLNG gateway's TTS cache still serves repeat requests. See [Bring your own key](/execution-layer/byok) for the full provider list. ## HTTP streaming for Sarvam Bulbul v3 [Sarvam AI Bulbul Stream v3](/api-reference/tts/sarvam-ai-bulbul-stream-v3/bulbul-stream-v3-http) is now available as a chunked HTTP endpoint. The response is raw audio bytes in the codec you set with `output_audio_codec` (there is no JSON envelope and no `X-Duration` header), so you can start playback as soon as the first chunk arrives. Use it to reach the same 30+ Indian-language voices as `bulbul:v3` with lower time-to-first-byte. ## Streaming transcription for Sarvam Saaras v3 Sarvam AI [Saaras v3](/api-reference/stt/sarvam-ai-saaras/saaras-v3-ws) is now available over WebSocket for real-time transcription across 23 Indian languages. Configure each session with query parameters on the upgrade URL (`language-code`, `mode`, `sample_rate`, `input_audio_codec`, `high_vad_sensitivity`, and `vad_signals`) to tune voice activity detection and audio handling per stream. ## HTTP transcription for Nova 3 Indic languages Deepgram Nova 3 now exposes HTTP endpoints for [Kannada](/api-reference/stt/deepgram-nova-3/nova-3-kannada-http), [Marathi](/api-reference/stt/deepgram-nova-3/nova-3-marathi-http), [Tamil](/api-reference/stt/deepgram-nova-3/nova-3-tamil-http), and [Telugu](/api-reference/stt/deepgram-nova-3/nova-3-telugu-http). You can transcribe these languages with a single request (binary upload or `url` field) instead of opening a WebSocket. ## Soniox TTS language coverage [Soniox TTS](/voices/soniox) now ships voices for 60+ languages, including Afrikaans, Bengali, Bulgarian, Catalan, Czech, Dutch, Greek, Gujarati, Hebrew, Indonesian, Kannada, Malay, Marathi, Norwegian, Tamil, Telugu, and more. Set the `language` field to any supported ISO 639-1 code on the [Soniox TTS HTTP endpoint](/api-reference/tts/soniox-tts-v1/soniox-tts-v1-http) to pick a voice catalog. ## Runtime variables for voice agents Voice agents can now declare `runtime_variables` so the model can capture values during a call and reuse them later in webhook URLs or system tool arguments. The built-in `set_runtime_variables` tool writes the values, which persist for the lifetime of the call or web session. See [Voice agents](/voice-agents) for the setup pattern. ## New agent regions and world parts The [region override](/region-override) catalog now includes the `eu-non-eu` and `me` world parts, plus the `asia-south1`, `asia-southeast2`, and `australia-southeast1` regions. Pin requests to Sydney, Jakarta, or a non-EU European region with `X-Region-Override`, or stay inside the Middle East with `X-World-Part-Override: me`. ## Voice agents no longer need a separate API key Voice agent create and duplicate requests no longer accept `slng_api_key`. Agents now use the API key you authenticate the request with. Drop the field from your payloads. Agent duplication also stops copying the inbound connection and call history; reconnect inbound routing on the copy if you need it. ## Deepgram Aura 2 region availability SLNG-hosted [Deepgram Aura 2 English](/api-reference/tts/deepgram-aura-2/aura-2-english-http) is no longer available in `us-east-1`. Route English TTS to `eu-north-1`. [Aura 2 Spanish](/api-reference/tts/deepgram-aura-2/aura-2-spanish-http) drops the `na` world part and now serves only `eu`. ## Pronunciation dictionaries for TTS You can now create reusable [pronunciation dictionaries](/pronunciation-dictionaries) and attach them to any SLNG TTS request. Define rewrite rules once for brand names, acronyms, or domain terms, then reference the dictionary from HTTP, WebSocket, or Unified TTS calls. Manage dictionaries through the new [pronunciation dictionary endpoints](/api-reference/tts/pronunciation-dictionaries/create-pronunciation-dictionary-http). ## Full voice catalogs on provider pages Provider voice pages now list every voice in the catalog instead of capping the table at ten per language. The [Cartesia Sonic 3](/voices/cartesia-sonic-3) page shows all 745 voices, [Sarvam Bulbul](/voices/sarvam-bulbul) shows 405, [Deepgram Aura](/voices/deepgram-aura) shows 91, [Soniox](/voices/soniox) shows 120, [Murf](/voices/murf) shows 111, and [Kugel](/voices/kugel) shows 100. You can search and copy any voice ID directly from the provider page. ## Batch API reference now matches the gateway The [Batch API reference](/api-reference/speechmatics/create-batch-job) and [Batch API guide](/batch-guide) are realigned with the live `api.batch.slng.ai` gateway. Request and response schemas, supported audio formats, and the three submission flows (direct upload, URL input, and presigned S3 upload) reflect what the service actually accepts. ## URL-based audio for HTTP transcription The [Whisper Large v3](/api-reference/stt/whisper-large-v3/whisper-large-v3-http), [Cognigy STT bridge](/api-reference/bridges/cognigy-stt-bridge/cognigy-stt-bridge-http), [Jambonz STT bridge](/api-reference/bridges/jambonz-stt-bridge/jambonz-stt-bridge-http), and [Unmute STT bridge](/api-reference/unified-api/unmute-stt-bridge/unmute-stt-bridge-http) HTTP endpoints now accept a `url` field pointing to a publicly accessible audio file. Send a JSON body with `url` and `language` instead of a multipart upload when your audio is already hosted somewhere. ## Deepgram Nova 3 English region change SLNG-hosted [Deepgram Nova 3 English](/api-reference/stt/deepgram-nova-3/nova-3-english-http) is no longer available in `eu-north-1`. Route English transcription to `australia-southeast1` or `us-east-1` instead. ## voiceai CLI The new [`voiceai` CLI](/sdks/cli) runs text-to-speech and speech-to-text from a terminal. Install it with `curl`, Homebrew, or `npm`, then pipe audio between SLNG models and other tools without writing an HTTP client. ## JavaScript and Python SDKs The typed [JavaScript SDK](/sdks/javascript) (`voiceai-sdk` on npm) supports Node, Bun, and Deno. The [Python SDK](/sdks/python) (`voiceai-sdk` on PyPI) ships sync and async clients for Python 3.9+. Both wrap the full STT and TTS surface so you can drop the raw `fetch` and WebSocket plumbing. ## Agent skills for coding agents The [`slng-ai/skills`](/sdks/skills) pack teaches Claude Code and similar coding agents to call SLNG directly. Point your agent at the skills repo and it can pick models, build init messages, and stream audio for you. ## LiveKit Agents plugin The [`livekit-plugins-slng`](/agents/livekit-plugin) Python package connects [LiveKit Agents](https://docs.livekit.io/agents/) to any STT or TTS model on the SLNG gateway with a single configuration switch. You can swap providers or regions without changing your agent code. ## Embed a voice agent on the web A new [browser embed guide](/agents/embed-web) walks through adding a SLNG voice session to any web page. It uses LiveKit, a React frontend, and a backend proxy that keeps your API key off the client. ## Rime Arcana v3 Spanish [Rime Arcana v3 (Spanish)](/api-reference/tts/rime-arcana-v3/arcana-v3-spanish-ws) is available as a new TTS endpoint, with both streaming WebSocket and one-shot HTTP synthesis. Choose from ten Spanish voices (`aurelio`, `celestino`, `lark`, `luz`, `mar`, `nova`, `pola`, `seraphina`, `sirius`, and `ursa`) and pass `model: rime/arcana:3-es` on init. ## Rime Arcana v3 adds eu-north-1 [Arcana v3 English](/api-reference/tts/rime-arcana-v3/arcana-v3-english-ws), [Hindi](/api-reference/tts/rime-arcana-v3/arcana-v3-hindi-ws), and Spanish are now available in `eu-north-1`. You can route Arcana v3 synthesis to North Europe for lower latency in that region. ## Deepgram Nova 3 adds asia-south1 [Deepgram Nova 3](/api-reference/stt/deepgram-nova-3/nova-3-multi-language-ws) Tamil, Telugu, Marathi, and Kannada are now available in `asia-south1`, in addition to `ap-south-1`. You can route South Asian language transcription to the Mumbai GCP region. ## Deepgram Nova 3 Spanish region change Deepgram Nova 3 Spanish is no longer available in `ap-southeast-2`. Use `australia-southeast1` or `us-east-1` instead. ## Soniox TTS region override removed The `x-region` header is no longer accepted on [Soniox TTS v1](/api-reference/tts/soniox-tts-v1/soniox-tts-v1-ws) requests. Soniox TTS runs only in `na`, so requests are routed there automatically. ## Expanded Murf Falcon voice catalog The [Murf Falcon](/voices/murf) voice catalog now lists 133 voices across more than 20 locales, including new entries for Bengali, Tamil, Telugu, Gujarati, Kannada, Punjabi, Japanese, Korean, Portuguese, Dutch, Polish, Greek, Croatian, and Scottish English. You can browse voices by full locale code and copy the `voice_id` for use in your init message. ## Simplified Unmute TTS bridge requests The [Unmute TTS bridge](/api-reference/unified-api/unmute-tts-bridge/unmute-tts-bridge-http) no longer requires a `model` field on HTTP or WebSocket init messages. The model is now inferred from the `{model_variant}` path. Send only `voice` and `text` for HTTP requests, or `voice` plus optional `config` on the WebSocket init. ## Rime Coda Indonesian TTS [Rime Coda](/api-reference/tts/rime-coda/coda-indonesian-ws) is available as a new TTS model in the `asia-southeast2` (Jakarta) region. It synthesizes Bahasa Indonesian with low latency across four voices (`pujianti_plesmita`, `siswoko_sigit`, `taryadi_dani`, and `usmany_tatianna`) and supports streaming WebSocket and one-shot HTTP synthesis. ## Region and world-part query parameters on bridges The Cognigy, Jambonz, and Unmute HTTP bridges now accept `?region=` and `?world-part=` query parameters, mirroring the `X-Region-Override` and `X-World-Part-Override` headers. Use the query form when your platform cannot set custom headers; if both are present, the header wins. See [Integrations overview](/integrations/overview#region-and-world-part-overrides) for details. ## Cartesia Sonic 3, Murf Falcon, Kugel, Soniox, Reson8, and Sarvam in the Unified API The [Unified API](/execution-layer/unified-api-models) now routes to Cartesia Sonic 3, Murf Falcon, KugelAudio Kugel 1/1-Turbo/2, Soniox TTS v1, Soniox Speech AI v4, Reson8 STT v1, and Sarvam Saaras v3. You can swap between these providers using a single request shape: pass identifiers like `cartesia/sonic:3`, `murf/murftts:falcon`, `kugelaudio/kugel:2`, `soniox/tts-rt:v1`, `soniox/speech-ai:rt-v4`, or `reson8/reson8stt:v1`. ## Webhook tools support custom HTTP methods and raw payloads [Voice agent webhook tools](/examples/agents-config#system-triggered-webhooks) now accept `http_method` (`POST`, `PUT`, `PATCH`, or `DELETE`) and `webhook_format` (`envelope` or `raw`). Set `webhook_format` to `raw` to send only the tool arguments when the receiving service cannot parse the SLNG envelope. ## Tool execution tracking for voice agents You can record webhook, template, human-transfer, and built-in tool activity against a call by posting to the [tool executions endpoint](/api-reference/calls/submit-tool-execution). Each record carries the outcome, duration, and HTTP status, and submitted executions surface on the [call detail response](/api-reference/calls/get-call) for debugging or analytics. ## ElevenLabs Flash v2.5 adds Asia Pacific region [ElevenLabs Flash v2.5](/api-reference/tts/eleven-flash-v2-5/elevenlabs-flash-v2-5-ws) is now available in `ap`, in addition to `eu`. You can route synthesis to Asia Pacific endpoints for lower latency in that region. ## Nova 3 multi-language adds eu-north-1 [Deepgram Nova 3 multi-language](/api-reference/stt/deepgram-nova-3/nova-3-multi-language-ws) is now available in `eu-north-1` as a specific region (in addition to the broader `eu` world part), giving you direct routing to North Europe. ## Deepgram Aura 2 English region change [Deepgram Aura 2 English](/api-reference/tts/deepgram-aura-2/aura-2-english-ws) is no longer available in `ap-southeast-2`. Use `eu-north-1` or `us-east-1` instead. ## Nova 3 Hindi region change [Deepgram Nova 3 Hindi](/api-reference/stt/deepgram-nova-3/nova-3-hindi-ws) is no longer available in `ap-southeast-2`. Use `asia-south1` instead. ## Soniox TTS v1 general availability [Soniox TTS](/api-reference/tts/soniox-tts-v1/soniox-tts-v1-ws) graduates from preview to v1. Update your client to call `tts-rt:v1` and set `model` to `tts-rt-v1`. The `v1-preview` path and `tts-rt-v1-preview` model identifier are retired. ## Deepgram Aura 2 voice selection now required The `model` field is now required on every [Deepgram Aura 2 English](/api-reference/tts/deepgram-aura-2/aura-2-english-ws) and [Spanish](/api-reference/tts/deepgram-aura-2/aura-2-spanish-ws) request. The previous defaults (`aura-2-thalia-en` and `aura-2-celeste-es`) no longer apply, so you must pick a voice explicitly. ## Kugel 2 TTS [Kugel 2](/api-reference/tts/kugel-2/kugel-2-ws) is available as a new TTS model in `eu`. It offers 87 voices with expressiveness control across 26 languages, including Arabic, Chinese, Hindi, Japanese, Korean, and Vietnamese. ## Soniox TTS v1-preview [Soniox TTS v1-preview](/api-reference/tts/soniox-tts-v1/soniox-tts-v1-ws) is available as a new TTS model in `na`, with both streaming WebSocket and one-shot HTTP synthesis. Browse the voice catalog on the [Soniox TTS voices](/voices/soniox) page. ## Voice catalog pages for Cartesia Sonic 3 and Murf Falcon You can now browse [Cartesia Sonic 3](/voices/cartesia-sonic-3) and [Murf Falcon](/voices/murf) voices with audio samples directly in the docs. Each entry shows the `voice_id` to pass in your init message. ## Nova 3 multi-language adds EU region [Deepgram Nova 3 multi-language](/api-reference/stt/deepgram-nova-3/nova-3-multi-language-ws) is now available in `eu`, in addition to `ap-southeast-2` and `us-east-1`. You can route multilingual transcription to European endpoints for lower latency. ## Nova 3 Hindi region change [Deepgram Nova 3 Hindi](/api-reference/stt/deepgram-nova-3/nova-3-hindi-ws) is no longer available in `ap-south-1`. Use `ap-southeast-2` or `asia-south1` instead. ## Soniox Speech AI Real-time v4 Soniox Speech AI moves to v4. Use the [Speech AI Real-time v4](/api-reference/stt/soniox-speech-ai-real-time-v4/speech-ai-real-time-v4-ws) endpoint for streaming transcription with speaker diarization, automatic language identification, and configurable endpoint detection across 60+ languages. The v3 endpoint has been retired. Point clients at the new path to continue receiving native Soniox token frames. ## LiveKit plugin compatibility refresh The [LiveKit Agents plugin](/agents/livekit-plugin) now targets `livekit-agents>=1.5.1` and Python 3.10+. You can pass model-specific options as keyword arguments. For example, `whisper_params` for Whisper, `target_language_code` for Sarvam STT, or `modelId` and `speakingStyle` for Rime Arcana. New `slng_base_url` and `http_session` arguments let you point at a self-hosted gateway and reuse an `aiohttp.ClientSession`. ## Sarvam Saaras v3 STT not supported in LiveKit plugin Saaras is HTTP-only on SLNG and has no WebSocket endpoint, so it cannot run through the LiveKit plugin's realtime path. For Hindi voice agents, use `slng/deepgram/nova:3-hi` or `slng/deepgram/nova:3-multi`. See the [LiveKit plugin provider notes](/agents/livekit-plugin#provider-notes) for details. ## Expanded regions for Murf Falcon TTS [Murf Falcon](/api-reference/tts/murf-falcon/murf-falcon-ws) is now available in `ap`, `eu-non-eu`, `me`, and `na`, in addition to the existing `eu` world part. You can now route synthesis closer to users across the Americas, Asia Pacific, and the Middle East. ## Asia Pacific region for Soniox Speech AI Real-time v3 [Soniox Speech AI Real-time v3](/api-reference/stt/soniox-speech-ai-real-time-v4/speech-ai-real-time-v4-ws) adds the `ap` world part alongside `eu` and `na`. Route transcription to Asia Pacific endpoints for lower latency in that region. ## URL and presigned S3 inputs for Batch STT You can now submit audio to the [Batch STT API](/batch-guide) without uploading a file on every request. Pass a publicly accessible `input_url`, or request a presigned S3 URL, upload directly, then create the job with the returned `s3_key`. Both methods accept an optional `metadata` object for attaching arbitrary key-value pairs to a job. ## Batch API usage guide A new [Batch API guide](/batch-guide) walks through the three input methods (file upload, URL input, and presigned S3 upload) with request flows and sample payloads. ## Deepgram Aura 2 English in eu-north-1 [Deepgram Aura 2 English TTS](/api-reference/tts/deepgram-aura-2/aura-2-english-ws) is now available in `eu-north-1`, in addition to `ap-southeast-2` and `us-east-1`. ## Whisper Large v3 Compressed removed The Whisper Large v3 Compressed STT model has been retired from the catalog. Use [Whisper Large v3](/api-reference/stt/whisper-large-v3/whisper-large-v3-ws) for multilingual transcription going forward. ## Runtime variables for voice agents Voice agents can now capture values during a call and reuse them in webhook URLs and system tool arguments. Define a `runtime_variables` array on your agent, and the model sets values through the built-in `set_runtime_variables` tool. See the [agent configuration examples](/examples/agents-config#runtime-variables) for setup details. ## Webhook HTTP method and payload format Webhook tools on voice agents now accept `http_method` (POST, PUT, PATCH, or DELETE) and `webhook_format` (`envelope` or `raw`). Use `raw` to send only the tool arguments as the request body, skipping the SLNG metadata envelope. Both fields are documented in the [Voice Agents API reference](/api-reference/agents/create-agent). ## Expanded regions for Rime Arcana v2 and Cartesia Sonic 3 [Rime Arcana v2 TTS](/api-reference/tts/rime-arcana-v2/arcana-v2-english-ws) is now available in `eu-north-1` and `us-east-1`, in addition to `ap-southeast-2`. [Cartesia Sonic 3 TTS](/api-reference/tts/cartesia-sonic-3/cartesia-sonic-3-ws) is now available in all three world parts: `ap`, `eu`, and `na`. ## New regions for Deepgram Nova 3 English and Hindi [Nova 3 English](/api-reference/stt/deepgram-nova-3/nova-3-english-ws) is now available in `ap-south-1` and `us-east-1`. [Nova 3 Hindi](/api-reference/stt/deepgram-nova-3/nova-3-hindi-ws) adds `asia-south1` alongside existing regions. ## Utterance end events on Unmute STT Bridge The [Unmute STT Bridge](/api-reference/unified-api/unmute-stt-bridge/unmute-stt-bridge-ws) now emits `utterance_end` events when the upstream model signals the end of a spoken utterance. This gives you an explicit boundary marker for segmenting transcription output. ## Native token stream for Soniox Speech AI [Soniox Speech AI Real-time v3](/api-reference/stt/soniox-speech-ai-real-time-v4/speech-ai-real-time-v4-ws) now returns native Soniox token frames instead of normalized transcripts. You receive interim and final tokens directly, including `` and `` endpoint markers when endpoint detection is enabled. ## Tool personalization for voice agents You can now use `{{variable}}` placeholders in runtime tool fields: webhook URLs, system webhook argument values, human transfer phone numbers, and built-in timezones. Values resolve when the tool executes, not at session start, so a missing tool variable does not block the call. Supported surfaces, validation rules, and examples are documented on the [Configuration & Tools](/examples/agents-config#tool-personalization) page. ## Tool execution tracking on agent calls A new [tool executions endpoint](/api-reference/calls/submit-tool-execution) lets you record webhook, template, human transfer, and built-in tool activity against a call. Execution records (including outcome, duration, and HTTP status) also appear in the [call detail response](/api-reference/calls/get-call). ## Cartesia Sonic 3 TTS [Cartesia Sonic 3](/api-reference/tts/cartesia-sonic-3/cartesia-sonic-3-ws) is available as a new TTS provider. It supports low-latency streaming synthesis over WebSocket with context-aware generation controls. ## Reson8 STT [Reson8 STT v1](/api-reference/stt/reson8-stt-v1/reson8-stt-v1-ws) is available as a new STT provider. It supports real-time transcription over WebSocket with word-level timestamps, confidence scores, and partial results in nine languages including Dutch, French, German, and Spanish. ## Deepgram Nova 3 Indic language endpoints Four new SLNG-hosted Deepgram Nova 3 language variants are available in `ap-south-1` (Mumbai): [Kannada](/api-reference/stt/deepgram-nova-3/nova-3-kannada-ws), [Marathi](/api-reference/stt/deepgram-nova-3/nova-3-marathi-ws), [Tamil](/api-reference/stt/deepgram-nova-3/nova-3-tamil-ws), and [Telugu](/api-reference/stt/deepgram-nova-3/nova-3-telugu-ws). Each has a dedicated WebSocket endpoint for that language. ## Soniox Speech AI version correction The Soniox STT endpoint is now correctly labeled [Speech AI Real-time v3](/api-reference/stt/soniox-speech-ai-real-time-v4/speech-ai-real-time-v4-ws). URLs and navigation have been updated accordingly. ## Batch speech-to-text API You can now transcribe audio files asynchronously with the new [Batch STT API](/api-reference/batch). Upload a file, poll the job status, and download the transcript when ready. Supported formats include wav, mp3, flac, aac, ogg, m4a, mp4, amr, and mpeg. Powered by Speechmatics. ## Murf Falcon TTS [Murf Falcon](/api-reference/tts/murf-falcon/murf-falcon-ws) is available as a new TTS provider. It supports multilingual speech synthesis over WebSocket with multiple encodings and sample rates. ## Unified API documentation The new [Unified API](/execution-layer/unified-api) section explains how to use one endpoint pattern for every STT and TTS model. Swap providers by changing only the URL path. Your auth, request format, and code stay the same. Includes guides on [parameter coverage](/execution-layer/unified-api-parameters) and [supported models](/execution-layer/unified-api-models). ## Integrations hub A new [Integrations](/integrations/overview) page lists third-party platforms you can connect to SLNG. LiveKit, Cognigy, and Jambonz each have dedicated setup paths. ## Whisper Large v3 endpoint consolidated The separate Whisper Large v3 Compressed endpoint has been removed. Use the standard [Whisper Large v3](/api-reference/stt/whisper-large-v3/whisper-large-v3-ws) endpoint, which now handles compressed audio directly. ## Language selection for Nova 3 STT SLNG-hosted Deepgram Nova 3 STT endpoints accept a `language` parameter in the WebSocket `init` config. Supported locales by variant: * **English**: `en`, `en-au`, `en-us`, `en-nz`, `en-gb`, `en-in` * **Spanish**: `es`, `es-us`, `es-419`, `es-ar`, `es-mx`, `es-es` * **Hindi**: `hi`, `en` * **Multi-language**: `multi` The Hindi variant also accepts `en`, so you can transcribe English audio without switching endpoints. See the [Speech-to-Text models page](/models/stt) for the full parameter list. ## More sample rates for Rime Arcana TTS [Rime Arcana](/voices/rime-arcana) now supports 8, 16, 22.05, 24 (default), 32, 44.1, and 48 kHz. You can match your audio pipeline directly without resampling. ## Simplified endpointing parameter The `endpointing` parameter on Deepgram STT endpoints now accepts only an integer (milliseconds of silence before finalizing speech). Set it to `0` to disable. Default remains `10`. ## Graceful WebSocket session close Send `{ "type": "close" }` on any [WebSocket](/websockets) connection to shut down cleanly. The server finishes processing remaining audio, then closes. This replaces the previous `cancel` behavior and works across TTS, STT, and [bridges](/execution-layer/unified-api). ## Keepalive for STT streams Send `{ "type": "keepalive" }` on STT WebSocket connections to prevent idle timeouts during pauses. Useful for voice agent sessions where the user goes silent but the connection should stay open. ## Endpointing controls for Deepgram Nova STT Two new parameters on [Deepgram Nova STT models](/models/stt) for tuning speech segmentation: * **`endpointing`**: milliseconds of silence before finalizing speech. Set to `false` to disable. Default: `10`. * **`utterance_end_ms`**: milliseconds of silence between words before an `UtteranceEnd` event. Range: 200–5000 ms, default: 1000 ms. ## India region for Nova 3 Hindi Deepgram Nova 3 Hindi is now available in `ap-south-1` (Mumbai), alongside `ap-southeast-2` (Sydney). Use the `X-Region-Override` header to route to the closest region. See [models by region](/models/by-region). # Agent Infra Source: https://docs.slng.ai/dashboard/agent-infra Use Agent Infra in the SLNG Dashboard to create voice agents from templates, configure prompts and models, test in-browser, and monitor live call traffic. Agent Infra is where you create and operate voice agents: * [Agent Infra](https://app.slng.ai/agent-infra) ## Create an agent * [New agent](https://app.slng.ai/agent-infra/new) * Start from a template (Healthcare, Insurance, Financial Services, Hospitality & Travel) or create from scratch * Configure the system prompt, greeting(s), models, and tools * Duplicate an existing agent when you want to reuse its stored configuration * If you plan to dispatch outbound calls or use human transfer tools, attach an outbound connection (see [Telephony](/dashboard/telephony)) Personalization is available in two places when you create an agent: * [Template variables](/examples/agents-config#template-variables) for prompts and greetings * [Tool personalization](/examples/agents-config#tool-personalization) for supported tool fields such as webhook URLs, system webhook argument values, human transfer phone numbers, and built-in timezones Webhook text controls such as descriptions, result instructions, and forced pre-action message text stay static and do not support runtime personalization. Use this mental model when configuring agents in the Dashboard: * Prompt and greeting variables are checked at session start * Tool-only personalization is checked only when that tool runs That means a call can still start even if a tool-only value is missing, but the tool itself will fail if it executes without a valid rendered value. Tool fields that support personalization now show a Personalizable indicator in the editor. Hover it to see the supported syntax, when the value is validated, the required final format, and an example. For the full rules and examples, see [Tool personalization](/examples/agents-config#tool-personalization). For webhook URLs, personalization is limited to path segments and query parameter values. The host, port, credentials, fragments, and query parameter names must remain literal. You can also manage agents programmatically. See the [Create agent](/api-reference/agents/create-agent), [Update agent](/api-reference/agents/update-agent-partial), and [List agents](/api-reference/agents/list-agents) API endpoints. ## Test in browser Use the built-in test page for a web (non-telephony) session: * `https://app.slng.ai/agent-infra//test` Agent Infra built-in test page You can also create web sessions via the API. See [Agent API examples](/examples/agents-api#test-with-a-web-session). ## Calls Each agent has a dedicated calls page where you can monitor and review all call activity: * Calls list: `https://app.slng.ai/agent-infra//calls` * Call details: `https://app.slng.ai/agent-infra//calls/` ### Call list The calls list shows all calls for an agent (both inbound and outbound), with the most recent calls first. You can filter by status to find active, completed, or failed calls. Agent Infra calls list showing call history ### Call details Each call record includes: * **Status**: current call state (e.g., dispatched, active, completed, failed) * **Phone number**: the E.164 number involved in the call * **Template arguments**: the values passed for template variables (e.g., `patient_name`, `practice_name`) * **Rendered prompt**: the final system prompt after template variables are resolved * **Timestamps**: when the call was created and last updated * **Session report**: when available, includes transcript and session-level metadata Agent Infra call details showing status, transcript, and metadata ### Dispatching outbound calls You can dispatch outbound calls from the Agent Infra UI or programmatically via the API. Outbound calls require the agent to have an outbound SIP trunk configured. See [Telephony](/dashboard/telephony). For code examples, see [Agent Call Examples](/examples/agents-calls). Everything in Agent Infra maps to public API endpoints. See the Voice Agents API tab for a complete reference. # Telephony Source: https://docs.slng.ai/dashboard/telephony Set up SIP trunks and phone-number connections in the SLNG Dashboard for outbound and inbound voice agent calls, including BYOC and managed providers. Connect phone numbers to your voice agents for outbound and inbound calls. Telephony is configured in the [Telephony](https://app.slng.ai/telephony) section of the Dashboard. ## Outbound calling Outbound calls require an outbound connection. Open [Create outbound connection](https://app.slng.ai/telephony/outbound/new) in the Dashboard and configure your trunk or provider. In **Agent Infra**, attach the outbound connection to the agent that will place calls. Without an outbound connection, call dispatch via the API (`POST /v1/agents/{agent_id}/calls`) fails. ## Inbound calling (optional) To route inbound calls to a specific agent, create an inbound connection: * [Create inbound connection](https://app.slng.ai/telephony/inbound/new) # SLNG Voice Agent API examples Source: https://docs.slng.ai/examples/agents-api Code samples for the SLNG Voice Agent API in JavaScript and Python. Create, list, update, test, and delete voice agents through the agent lifecycle. Working code for the agent lifecycle. For prompt design, tool configuration, and template variables, see [Configuring voice agents](/examples/agents-config). ## Placeholders Every snippet below uses these placeholders. Replace them before running the code. | Placeholder | Replace with | | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys). Use it as the bearer token in the `Authorization` header. | | `AGENT_ID` / `550e8400-e29b-41d4-a716-446655440000` | The agent's UUID returned by `POST /v1/agents` | | `region: "eu-central"` | A Voice Agents logical region. See [Agent regions](/voice-agents#agent-regions) | ## Create an agent ```javascript JavaScript theme={null} const resp = await fetch("https://api.agents.slng.ai/v1/agents", { method: "POST", headers: { Authorization: "Bearer SLNG_API_KEY", "Content-Type": "application/json", }, body: JSON.stringify({ name: "Appointment Reminder", system_prompt: "You are a friendly appointment reminder for {{clinic_name}}. " + "Confirm the patient upcoming visit or help them reschedule. " + "Keep responses to one or two sentences.", greeting: "Hi {{patient_name}}, this is a reminder from {{clinic_name}} about your upcoming appointment.", language: "en", region: "eu-central", models: { stt: "slng/deepgram/nova:3-en", llm: "groq/openai/gpt-oss-120b", tts: "slng/deepgram/aura:2-en", tts_voice: "aura-2-thalia-en", }, template_defaults: { clinic_name: "Downtown Health Clinic", patient_name: "there", }, }), }); const agent = await resp.json(); console.log(agent.id); ``` ```python Python theme={null} import requests resp = requests.post( "https://api.agents.slng.ai/v1/agents", headers={"Authorization": "Bearer SLNG_API_KEY"}, json={ "name": "Appointment Reminder", "system_prompt": ( "You are a friendly appointment reminder for {{clinic_name}}. " "Confirm the patient upcoming visit or help them reschedule. " "Keep responses to one or two sentences." ), "greeting": "Hi {{patient_name}}, this is a reminder from {{clinic_name}} about your upcoming appointment.", "language": "en", "region": "eu-central", "models": { "stt": "slng/deepgram/nova:3-en", "llm": "groq/openai/gpt-oss-120b", "tts": "slng/deepgram/aura:2-en", "tts_voice": "aura-2-thalia-en", }, "template_defaults": { "clinic_name": "Downtown Health Clinic", "patient_name": "there", }, }, ) agent = resp.json() print(agent["id"]) ``` ```bash cURL theme={null} curl -X POST https://api.agents.slng.ai/v1/agents \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Appointment Reminder", "system_prompt": "You are a friendly appointment reminder for {{clinic_name}}. Confirm the patient upcoming visit or help them reschedule. Keep responses to one or two sentences.", "greeting": "Hi {{patient_name}}, this is a reminder from {{clinic_name}} about your upcoming appointment.", "language": "en", "region": "eu-central", "models": { "stt": "slng/deepgram/nova:3-en", "llm": "groq/openai/gpt-oss-120b", "tts": "slng/deepgram/aura:2-en", "tts_voice": "aura-2-thalia-en" }, "template_defaults": { "clinic_name": "Downtown Health Clinic", "patient_name": "there" } }' ``` ## Test with a web session Web sessions let you talk to your agent from a browser. No phone number or SIP trunk required. The response includes LiveKit credentials you can use with any [LiveKit client SDK](https://docs.livekit.io/client-sdk-js/). ```javascript JavaScript theme={null} const resp = await fetch( `https://api.agents.slng.ai/v1/agents/${agent.id}/web-sessions`, { method: "POST", headers: { Authorization: "Bearer SLNG_API_KEY", "Content-Type": "application/json", }, body: JSON.stringify({ arguments: { patient_name: "Maria", clinic_name: "Greenfield Family Medicine" }, }), }, ); const session = await resp.json(); // session.livekit_url: WebSocket URL to connect to // session.livekit_token: access token for the room // session.room_name: LiveKit room name // session.call_id: use this to look up the call later // session.max_session_seconds: how long the session stays open ``` ```python Python theme={null} resp = requests.post( f"https://api.agents.slng.ai/v1/agents/{agent['id']}/web-sessions", headers={"Authorization": "Bearer SLNG_API_KEY"}, json={ "arguments": {"patient_name": "Maria", "clinic_name": "Greenfield Family Medicine"}, }, ) session = resp.json() # session["livekit_url"]: WebSocket URL to connect to # session["livekit_token"]: access token for the room # session["room_name"]: LiveKit room name # session["call_id"]: use this to look up the call later # session["max_session_seconds"]: how long the session stays open ``` ```bash cURL theme={null} curl -X POST https://api.agents.slng.ai/v1/agents/AGENT_ID/web-sessions \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "participant_name": "Test user", "arguments": { "patient_name": "Maria" } }' ``` Both fields are optional. `arguments` overrides `template_defaults` on the agent, and `participant_name` sets a display name in the LiveKit room. ## Update an agent PATCH updates only the fields you send. Use PUT if you want to replace the entire configuration. ```javascript JavaScript theme={null} const resp = await fetch( `https://api.agents.slng.ai/v1/agents/${agent.id}`, { method: "PATCH", headers: { Authorization: "Bearer SLNG_API_KEY", "Content-Type": "application/json", }, body: JSON.stringify({ greeting: "Hello {{patient_name}}, thanks for picking up.", models: { stt: "slng/deepgram/nova:3-en", llm: "groq/openai/gpt-oss-120b", tts: "slng/deepgram/aura:2-en", tts_voice: "aura-2-thalia-en", llm_kwargs: { temperature: 0.3 }, }, }), }, ); const updated = await resp.json(); ``` ```python Python theme={null} resp = requests.patch( f"https://api.agents.slng.ai/v1/agents/{agent['id']}", headers={"Authorization": "Bearer SLNG_API_KEY"}, json={ "greeting": "Hello {{patient_name}}, thanks for picking up.", "models": { "stt": "slng/deepgram/nova:3-en", "llm": "groq/openai/gpt-oss-120b", "tts": "slng/deepgram/aura:2-en", "tts_voice": "aura-2-thalia-en", "llm_kwargs": {"temperature": 0.3}, }, }, ) updated = resp.json() ``` ```bash cURL theme={null} curl -X PATCH https://api.agents.slng.ai/v1/agents/AGENT_ID \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "greeting": "Hello {{patient_name}}, thanks for picking up.", "models": { "stt": "slng/deepgram/nova:3-en", "llm": "groq/openai/gpt-oss-120b", "tts": "slng/deepgram/aura:2-en", "tts_voice": "aura-2-thalia-en", "llm_kwargs": { "temperature": 0.3 } } }' ``` ## Duplicate an agent Use duplication when you want to copy an existing agent configuration as-is, including prompts, models, tools, template defaults, selected region, and outbound telephony settings. The duplicate does not copy call history or the inbound connection. After duplicating an agent: * Reconnect inbound routing if the duplicate should receive calls. * Verify the outbound trunk before dispatching calls. * Test the duplicate with a web session or a test call. ```javascript JavaScript theme={null} const duplicated = await fetch( `https://api.agents.slng.ai/v1/agents/${agent.id}/duplicate`, { method: "POST", headers: { Authorization: "Bearer SLNG_API_KEY", "Content-Type": "application/json", }, body: JSON.stringify({ name: `${agent.name} Copy`, }), }, ).then((r) => r.json()); console.log(duplicated.id); ``` ```python Python theme={null} duplicated = requests.post( f"https://api.agents.slng.ai/v1/agents/{agent['id']}/duplicate", headers={"Authorization": "Bearer SLNG_API_KEY"}, json={ "name": f"{agent['name']} Copy", }, ).json() print(duplicated["id"]) ``` ```bash cURL theme={null} curl -X POST https://api.agents.slng.ai/v1/agents/AGENT_ID/duplicate \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "Appointment Reminder Copy" }' ``` ## List and inspect agents ```javascript JavaScript theme={null} // List all agents const agents = await fetch("https://api.agents.slng.ai/v1/agents", { headers: { Authorization: "Bearer SLNG_API_KEY" }, }).then((r) => r.json()); for (const a of agents) { console.log(`${a.name} (${a.id})`); } // Get a single agent and check its template variables const agent = await fetch(`https://api.agents.slng.ai/v1/agents/${agents[0].id}`, { headers: { Authorization: "Bearer SLNG_API_KEY" }, }).then((r) => r.json()); console.log(agent.template_variables); // { clinic_name: { required: true, default: "Downtown Health Clinic" }, ... } ``` ```python Python theme={null} # List all agents agents = requests.get( "https://api.agents.slng.ai/v1/agents", headers={"Authorization": "Bearer SLNG_API_KEY"}, ).json() for a in agents: print(f"{a['name']} ({a['id']})") # Get a single agent and check its template variables agent = requests.get( f"https://api.agents.slng.ai/v1/agents/{agents[0]['id']}", headers={"Authorization": "Bearer SLNG_API_KEY"}, ).json() print(agent["template_variables"]) # {"clinic_name": {"required": true, "default": "Downtown Health Clinic"}, ...} ``` ## Delete an agent Returns `204 No Content` on success. ```javascript JavaScript theme={null} await fetch(`https://api.agents.slng.ai/v1/agents/${agent.id}`, { method: "DELETE", headers: { Authorization: "Bearer SLNG_API_KEY" }, }); ``` ```python Python theme={null} requests.delete( f"https://api.agents.slng.ai/v1/agents/{agent['id']}", headers={"Authorization": "Bearer SLNG_API_KEY"}, ) ``` ```bash cURL theme={null} curl -X DELETE https://api.agents.slng.ai/v1/agents/AGENT_ID \ -H "Authorization: Bearer SLNG_API_KEY" ``` # Dispatching outbound calls Source: https://docs.slng.ai/examples/agents-calls Send outbound voice agent calls on SLNG with E.164 numbers, SIP trunks, template variables, and call status webhooks. JavaScript and Python code examples. You need an SLNG key and an agent created via the [API](/examples/agents-config) or [Agent Infra](/dashboard/agent-infra). See [Configuring voice agents](/examples/agents-config) for system prompt and tool setup. Outbound calls require the agent to be configured with an outbound SIP trunk (sip\_outbound\_trunk\_id). Configure this in the Dashboard Telephony page and attach it to your agent. ## Placeholders Use these values in the examples: | Placeholder | Replace with | | -------------- | -------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key with access to the Voice Agents API | | `AGENT_ID` | The ID of an agent configured with an outbound SIP trunk | | `+14155551234` | The destination phone number in E.164 format | ## Basic Call Dispatch Send a single outbound call by providing the agent ID and a phone number in E.164 format. ```javascript JavaScript theme={null} const API_KEY = "SLNG_API_KEY"; const BASE_URL = "https://api.agents.slng.ai"; async function dispatchCall(agentId, phoneNumber, args = {}) { const response = await fetch(`${BASE_URL}/v1/agents/${agentId}/calls`, { method: "POST", headers: { Authorization: `Bearer ${API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ phone_number: phoneNumber, arguments: args, }), }); if (!response.ok) { const error = await response.json().catch(() => ({})); throw new Error(`Failed to dispatch call: ${error.error || error.detail || response.status}`); } return response.json(); } // Basic call const result = await dispatchCall( "AGENT_ID", "+14155551234", ); console.log(`Call dispatched: ${result.call_id}`); ``` ```python Python theme={null} import requests API_KEY = "SLNG_API_KEY" BASE_URL = "https://api.agents.slng.ai" def dispatch_call(agent_id, phone_number, arguments=None): response = requests.post( f"{BASE_URL}/v1/agents/{agent_id}/calls", headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" }, json={ "phone_number": phone_number, "arguments": arguments or {} } ) response.raise_for_status() return response.json() # Basic call result = dispatch_call( "AGENT_ID", "+14155551234" ) print(f"Call dispatched: {result['call_id']}") ``` ```bash cURL theme={null} curl -X POST https://api.agents.slng.ai/v1/agents/AGENT_ID/calls \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "phone_number": "+14155551234" }' ``` ## Calls with Template Variables If your agent's system prompt or greeting contains `{{variables}}`, pass them via the `arguments` field. These override any `template_defaults` set on the agent. `arguments` is bounded to 32 keys, with keys up to 64 characters, values up to 1024 characters, and a combined value payload up to 8192 characters. ```javascript JavaScript theme={null} // Agent system_prompt uses: {{patient_name}}, {{practice_name}} const API_KEY = "SLNG_API_KEY"; const BASE_URL = "https://api.agents.slng.ai"; const response = await fetch(`${BASE_URL}/v1/agents/AGENT_ID/calls`, { method: "POST", headers: { Authorization: `Bearer ${API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ phone_number: "+14155551234", arguments: { patient_name: "Maria", practice_name: "Greenfield Family Medicine", }, }), }); if (!response.ok) { const error = await response.json().catch(() => ({})); throw new Error(`Failed to dispatch call: ${error.error || error.detail || response.status}`); } const result = await response.json(); console.log(`Call dispatched: ${result.call_id}`); ``` ```python Python theme={null} import requests API_KEY = "SLNG_API_KEY" BASE_URL = "https://api.agents.slng.ai" # Agent system_prompt uses: {{patient_name}}, {{practice_name}} response = requests.post( f"{BASE_URL}/v1/agents/AGENT_ID/calls", headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json", }, json={ "phone_number": "+14155551234", "arguments": { "patient_name": "Maria", "practice_name": "Greenfield Family Medicine", }, }, ) response.raise_for_status() result = response.json() print(f"Call dispatched: {result['call_id']}") ``` ```bash cURL theme={null} curl -X POST https://api.agents.slng.ai/v1/agents/AGENT_ID/calls \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "phone_number": "+14155551234", "arguments": { "patient_name": "Maria", "practice_name": "Greenfield Family Medicine" } }' ``` ## Batch Call Dispatch When you need to call a list of contacts, you can loop through them one at a time or fire multiple calls in parallel with a concurrency cap. ### Sequential Dispatch ```javascript theme={null} async function dispatchBatch(agentId, contacts) { const results = []; for (const contact of contacts) { try { const result = await dispatchCall(agentId, contact.phone, contact.args); results.push({ phone: contact.phone, success: true, call_id: result.call_id, }); // Small delay between calls to avoid rate limiting await new Promise((resolve) => setTimeout(resolve, 500)); } catch (error) { results.push({ phone: contact.phone, success: false, error: error.message, }); } } return results; } // Usage const contacts = [ { phone: "+14155551234", args: { patient_name: "Maria", practice_name: "Greenfield Family Medicine" }, }, { phone: "+14155555678", args: { patient_name: "John", practice_name: "Greenfield Family Medicine" }, }, { phone: "+14155559012", args: { patient_name: "Jane", practice_name: "Greenfield Family Medicine" }, }, ]; const results = await dispatchBatch( "AGENT_ID", contacts, ); console.log("Batch results:", results); ``` ### Parallel Dispatch (with concurrency limit) Process calls in chunks to stay within rate limits while dispatching faster than sequential. ```javascript JavaScript theme={null} async function dispatchBatchParallel(agentId, contacts, concurrency = 5) { const results = []; // Process in chunks for (let i = 0; i < contacts.length; i += concurrency) { const chunk = contacts.slice(i, i + concurrency); const chunkResults = await Promise.allSettled( chunk.map((contact) => dispatchCall(agentId, contact.phone, contact.args) .then((result) => ({ phone: contact.phone, success: true, call_id: result.call_id, })) .catch((error) => ({ phone: contact.phone, success: false, error: error.message, })), ), ); results.push(...chunkResults.map((r) => r.value || r.reason)); } return results; } ``` ```python Python theme={null} import asyncio import aiohttp API_KEY = "SLNG_API_KEY" BASE_URL = "https://api.agents.slng.ai" async def dispatch_call_async(session, agent_id, phone_number, arguments=None): async with session.post( f"{BASE_URL}/v1/agents/{agent_id}/calls", headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" }, json={ "phone_number": phone_number, "arguments": arguments or {} } ) as response: if response.status == 200: data = await response.json() return {"phone": phone_number, "success": True, "call_id": data["call_id"]} else: error = await response.json() return { "phone": phone_number, "success": False, "error": error.get("error") or error.get("detail") or str(error), } async def dispatch_batch(agent_id, contacts, concurrency=5): semaphore = asyncio.Semaphore(concurrency) async def limited_dispatch(session, contact): async with semaphore: return await dispatch_call_async( session, agent_id, contact["phone"], contact.get("args") ) async with aiohttp.ClientSession() as session: tasks = [limited_dispatch(session, contact) for contact in contacts] return await asyncio.gather(*tasks) # Usage contacts = [ {"phone": "+14155551234", "args": {"customer_name": "John Smith"}}, {"phone": "+14155555678", "args": {"customer_name": "Jane Doe"}}, {"phone": "+14155559012", "args": {"customer_name": "Bob Wilson"}}, ] results = asyncio.run(dispatch_batch("AGENT_ID", contacts)) for result in results: if result["success"]: print(f"✓ {result['phone']}: {result['call_id']}") else: print(f"✗ {result['phone']}: {result['error']}") ``` ## Common Use Cases Outbound voice agents work well for automated calls that follow a predictable structure. Common examples include: * **Appointment reminders**: confirm or reschedule upcoming visits * **Payment collection**: notify customers of outstanding balances * **Survey and feedback**: gather post-purchase or post-service input * **Lead qualification**: follow up on inbound interest and route warm leads to sales Use [template variables](#calls-with-template-variables) to personalize each call with contact-specific details like names, dates, and amounts. # Configuring SLNG voice agents Source: https://docs.slng.ai/examples/agents-config Configure SLNG voice agents. Write effective system prompts, wire up tools and webhooks, and use template variables to personalize calls at dispatch time. The [API reference](/api-reference) covers every field on the agent object. This page covers the parts that need more context than a schema can give: agent fields, model configuration, system prompt structure, tools, and template variables. You can manage agents through the API or through [Agent Infra](/dashboard/agent-infra). For working code examples (create, update, delete), see [Agent API examples](/examples/agents-api). ## Agent fields ### Required fields | Field | Type | Description | | --------------- | ------ | ------------------------------------------------------------------- | | `name` | string | Unique agent name (max 255 chars) | | `system_prompt` | string | System prompt with Handlebars template support | | `greeting` | string | Default greeting | | `language` | string | Language code (e.g., "en", "es", "fr") | | `region` | string | Deployment region. See [Agent regions](/voice-agents#agent-regions) | | `models` | object | Model configuration (see [Models](#models)) | ### Common optional fields | Field | Type | Default | Description | | ----------------------- | ------- | ----------------- | -------------------------------------------------- | | `inbound_greeting` | string | `null` | Greeting override for inbound calls | | `outbound_greeting` | string | `null` | Greeting override for outbound calls | | `sip_inbound_trunk_id` | string | `null` | Inbound connection ID (for inbound calls) | | `sip_outbound_trunk_id` | string | `null` | Outbound connection ID (for outbound calls) | | `enable_interruptions` | boolean | `true` | Allow user to interrupt agent speech | | `idle_nudges` | object | language defaults | Silence recovery prompts and final hangup behavior | | `tools` | array | `[]` | Tools available to the agent | | `template_defaults` | object | `{}` | Default values for template variables | ## Models The `models` object configures the STT, LLM, and TTS models used by the agent runtime: ```json theme={null} { "models": { "stt": "slng/deepgram/nova:3-en", "llm": "groq/openai/gpt-oss-120b", "tts": "slng/deepgram/aura:2-en", "tts_voice": "aura-2-thalia-en", "stt_kwargs": {}, "llm_kwargs": {}, "tts_kwargs": {} } } ``` ### Supported LLMs `llm` accepts one of the following model IDs: | Model | ID | | -------------- | --------------------------------------------- | | Nemotron Super | `bedrock-mantle/nvidia.nemotron-super-3-120b` | | Nemotron Nano | `bedrock-mantle/nvidia.nemotron-nano-3-30b` | | GPT OSS 120b | `groq/openai/gpt-oss-120b` | ### Fallbacks and timeouts ```json theme={null} { "models": { "stt": "slng/deepgram/nova:3-en", "llm": "groq/openai/gpt-oss-120b", "tts": "slng/deepgram/aura:2-en", "tts_voice": "aura-2-thalia-en", "fallbacks": { "stt": ["slng/deepgram/nova:3-multi"], "llm": ["groq/openai/gpt-oss-120b"], "tts": [ { "model": "slng/deepgram/aura:2-en", "voice": "aura-2-asteria-en" } ] }, "stt_final_timeout_s": 4.0, "llm_first_token_timeout_s": 6.0, "tts_first_audio_timeout_s": 4.0, "failure_audio_enabled": true } } ``` ## Writing a system prompt A system prompt tells the agent who it is, how to talk, and what to do on the call. We recommend splitting it into five sections: | Section | Purpose | | ------------------------------ | --------------------------------------------------------------- | | **Identity** | Name, role, and reason for calling | | **Style** | Tone, sentence length, filler words | | **Response guidelines** | One question at a time, how to handle mishearing, interruptions | | **Task and conversation flow** | Step-by-step script with `` markers | | **Guardrails** | Topics to avoid, how to handle off-script questions | ```text theme={null} [Identity] - Your name is Sarah, and you are a patient outreach coordinator for {{practice_name}}. - You are calling existing patients who are due for upcoming or overdue appointments, including annual physicals, follow-up visits, and preventive screenings. - Your goal is to help the patient schedule their next visit conveniently and answer basic questions about appointment availability. [Style] - Warm, caring, and conversational. You genuinely want to help the patient stay on top of their health. - Keep every response to 1-2 sentences. This is a phone call, not a lecture. - Use contractions and natural speech. Say "you're" not "you are," say "it's" not "it is." - Use softeners like "of course," "absolutely," "no problem," and "sure thing." - You may use light filler phrases like "Let me see..." or "So..." sparingly to sound natural. - Never use markdown, bullet points, numbered lists, or any text formatting in your responses. Everything you say will be read aloud by a text-to-speech engine and must sound completely natural as spoken language. [Response Guidelines] - Ask only one question at a time. Never stack multiple questions in a single response. - Wait for the patient to respond before continuing. - If something is unclear, politely ask for clarification. - Never fabricate appointment availability, medical information, or provider details. - Speak dates and times naturally. Say "Tuesday, March eleventh at two thirty in the afternoon" not "3/11 at 14:30." - When reading back any information like phone numbers or dates, normalize for speech. Say "five five five, eight six seven, five three oh nine" not "5558675309." - If you are not sure you heard the patient correctly, repeat back what you understood and ask for confirmation before proceeding. Speech-to-text can mishear, so always verify names, dates, and times. - If the patient interrupts you mid-sentence, stop speaking immediately and listen. Do not try to finish your previous thought. Acknowledge their input and adjust accordingly. [Task & Conversation Flow] 1. Greet the patient by name using the greeting message. Confirm you are speaking to the right person. 2. If the person confirms they are the patient, continue to step 3. If they say it is someone else or a wrong number, apologize politely and end the call. 3. Let the patient know they are due for a visit. Say something like "I see it's been a while since your last appointment, and we'd love to get you in for a checkup." Ask if they would be interested in scheduling. 4. If the patient says yes, ask what days of the week generally work best for them. 5. Based on their preference, offer two specific options. Use realistic example slots such as "We have a Tuesday, March eleventh at ten in the morning, or a Thursday, March thirteenth at two thirty in the afternoon. Either of those work for you?" 6. Once the patient picks a time, repeat back the full appointment details including the day, date, and time and ask them to confirm. This step is important. 7. If confirmed, let them know they are all set. Mention that they may receive a reminder closer to the date. Ask if they have any questions before you go. 8. If the patient says they are not interested or want to wait, respect their decision. Let them know they can call the office anytime when they are ready, and wish them well. 9. If the patient asks about a specific doctor or provider, let them know you can check availability for that provider and adjust the offered times accordingly. 10. If the patient asks medical questions such as symptoms, medications, or test results, let them know you are not able to provide medical advice but their care team would be happy to help with that during their visit. [Guardrails] - Never provide medical advice, diagnose symptoms, or discuss test results. - Never access, share, or discuss specific medical records, prescriptions, or treatment details. - Never pressure the patient into scheduling. If they decline, accept it gracefully. - Do not discuss billing, insurance coverage, or payment information. If asked, let them know the office billing team can help with those questions. - If the patient asks about something outside your scope, say something like "That's a great question. The team at the office would be the best ones to help with that." - If the conversation drifts off topic, gently guide it back by saying "I appreciate that. So regarding getting your visit on the calendar, would you like to pick a time?" - If the patient repeats "hello" or "hi" multiple times, respond with "I'm here, can you hear me alright?" - If there is prolonged silence, prompt gently with "Are you still there?" or "Take your time, I'm here whenever you're ready." - Never mention or imply that you are an AI, a bot, or an automated system unless directly asked. ``` A few things worth calling out: * **Keep it spoken.** The output goes through TTS, so write the way you'd actually talk. No markdown formatting, no numbered lists in the agent's responses. * **One question per turn.** Stacking questions confuses both the caller and the STT model. * **Use `` markers.** They make the conversation flow explicit for the LLM. * **Normalize for speech.** Phone numbers, dates, and times should be written out the way you'd say them aloud. ## Tools Agents support four tool types. You pass them in the `tools` array when creating or updating an agent. ### Template tools Two built-in templates: `hangup` and `voicemail_detection`. These handle call control without any backend setup. ```json theme={null} { "type": "template", "id": "dc4d6391-dc51-44e3-a816-bf65948290a2", "template": "hangup", "execution_policy": { "pre_action_message": { "enabled": true, "text": "Thank you for your time. Have a great day." } } } ``` The `execution_policy.pre_action_message` tells the agent to say something before executing the tool. For `hangup`, the agent will speak the text and then end the call. You can override the default tool prompt with the `prompt` field if you want to change when the LLM decides to use the tool. ### Current date/time tool Use the `current_datetime` built-in when the agent needs reliable time-sensitive context without calling your backend. ```json theme={null} { "type": "built_in", "id": "f0d84ac5-7988-460c-995d-2ad9621c09bb", "built_in": "current_datetime", "timezone": "Europe/Madrid", "prompt": "Use this whenever you need the current local date, time, day, or UTC offset for the caller." } ``` At runtime this becomes a `get_current_datetime` tool and returns a JSON object with: * `timezone` * `local_datetime` * `local_date` * `local_time` * `day_of_week` * `utc_offset` * `is_dst` Like webhook tools, you can also configure it with `"source": "system"` and `system.triggers` if you want the runtime to inject current date/time context automatically on events like `call_start`. ### Webhook tools Webhook tools call your backend over HTTP. The LLM decides when to invoke them based on the `name` and `description` you provide. Parameters follow JSON Schema. ```json theme={null} { "type": "webhook", "id": "d8f8c1d7-3cf0-4c95-bb06-f28a2a1d2b50", "name": "check_order", "description": "Look up the status of a customer order", "url": "https://api.yourcompany.com/orders/lookup", "parameters": { "type": "object", "properties": { "order_id": { "type": "string", "description": "The order ID" }, "customer_email": { "type": "string", "description": "Customer email for verification" } }, "required": ["order_id", "customer_email"] }, "auth": { "type": "bearer", "token": "your-api-token" }, "timeout_seconds": 10 } ``` **Authentication**: Two options: * `"auth": { "type": "bearer", "token": "..." }` sends an `Authorization: Bearer ` header * `"auth": { "type": "hmac", "secret": "..." }` sends an `X-Signature-256` header (HMAC-SHA256 of the request body) Secrets are write-only. The API never returns them. **Fire-and-forget**: Set `"wait_for_response": false` when you don't need the result back in the conversation, like logging a call event to your CRM. **Result visibility**: Use `"show_results_to_llm": false` when a webhook should run but the successful response body should stay hidden from the model. Use `llm_result_instructions` to tell the model how to use a successful response when it is visible. `llm_result_instructions` is static configuration and does not support runtime personalization. **URL personalization**: Webhook URLs may use `{{variables}}`, but only in path segments and query parameter values. The scheme, host, port, credentials, fragments, and query parameter names must remain literal. ### System-triggered webhooks By default, webhook tools are `"source": "contextual"`: the LLM calls them during conversation. Set `"source": "system"` to trigger them automatically on specific events instead. ```json theme={null} { "type": "webhook", "id": "a1b2c3d4-5678-9abc-def0-123456789abc", "name": "post_call_summary", "description": "Send a call summary to the CRM after the call ends", "url": "https://api.yourcompany.com/call-summaries", "parameters": { "type": "object", "properties": {} }, "source": "system", "wait_for_response": false, "system": { "triggers": [{ "event": "call_end" }], "arguments": [ { "name": "call_id", "type": "string", "required": true, "source": { "type": "call_id" } }, { "name": "transcript", "type": "transcript_messages", "required": true, "source": { "type": "transcript_messages", "max_messages": 200 } } ] } } ``` Available trigger events: `call_start`, `first_user_message`, `call_end`, `tool_succeeded`, `tool_failed`. For `tool_succeeded` and `tool_failed`, you also need to set `source_tool_id` to specify which tool's outcome fires the trigger. Arguments can pull from several sources: | Source type | What it provides | | -------------------------------- | ---------------------------------------------------------- | | `constant` | A fixed value you define | | `transcript_messages` | Recent conversation transcript (bounded by `max_messages`) | | `call_id`, `room_name`, `job_id` | IDs from the current session | | `agent_id`, `agent_name` | The agent's own metadata | | `phone_number` | Caller's phone number | | `first_user_message` | The first thing the caller said | | `call_end_reason` | Why the call ended | System webhook `description`, `llm_result_instructions`, and `execution_policy.pre_action_message.text` must remain static. Only `url` and `system.arguments[*].source.template` support runtime personalization for webhook tools. ### Human transfer Transfers the caller to a phone number. Requires an outbound SIP trunk on the agent. ```json theme={null} { "type": "human_transfer", "id": "b2c3d4e5-6789-abcd-ef01-234567890abc", "name": "transfer_to_support", "description": "Transfer the caller to a live support agent", "phone_number": "+14155559999", "execution_policy": { "pre_action_message": { "enabled": true, "text": "Let me connect you to a live support specialist." } } } ``` The LLM uses the `name` and `description` to decide when to transfer, just like webhook tools. ### Idle nudges `idle_nudges` configures what the runtime should do when the caller goes silent for too long. ```json theme={null} { "idle_nudges": { "enabled": true, "first_nudge_delay_seconds": 15, "second_nudge_delay_seconds": 30, "hangup_delay_seconds": 15, "first_nudge_text": "Are you still there?", "second_nudge_text": "I’m still here if you need a moment.", "final_hangup_text": "I’m going to disconnect for now. Feel free to call us back anytime." } } ``` If you omit `idle_nudges`, the backend can apply language-specific defaults when returning the agent configuration. ## Tool personalization Tool personalization uses the same `{{variable}}` syntax as prompts and greetings, but only on runtime-safe tool fields. Tool fields are validated when the tool runs rather than when the call starts. * **Prompt and greeting variables** are start-critical. Missing values can block the session. * **Tool-only variables** are runtime-only. Missing or invalid values fail the tool, not the call. Supported surfaces in v1: | Tool surface | Supports personalization | Notes | | ---------------------------------------- | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------ | | Contextual webhook `url` | Yes | Only in path segments and query parameter values. Scheme, host, port, credentials, fragments, and query parameter names must stay literal. | | System webhook `url` | Yes | Only in path segments and query parameter values. Scheme, host, port, credentials, fragments, and query parameter names must stay literal. | | System webhook argument values | Yes | Scalar values only in v1 (`string`, `integer`, `number`, `boolean`) | | System webhook `description` | No | Must stay static | | System webhook `llm_result_instructions` | No | Must stay static | | System webhook pre-action message text | No | Must stay static | | Human transfer `phone_number` | Yes | Must resolve to E.164 | | Built-in `timezone` | Yes | Must resolve to an IANA timezone | | Auth secrets | No | Secrets remain literal-only | ```json theme={null} { "tools": [ { "type": "webhook", "id": "2e4e8759-8467-4a53-9c96-0e08112ea13d", "name": "lookup_catalog_item", "description": "Load item details before continuing.", "url": "https://api.example.com/catalog/{{item_id}}/webhook?case={{case_id}}", "parameters": { "type": "object", "properties": {} }, "source": "system", "system": { "triggers": [{ "event": "call_end" }], "arguments": [ { "name": "retry_count", "type": "integer", "source": { "type": "template", "template": "{{max_retries}}" } }, { "name": "vip", "type": "boolean", "source": { "type": "template", "template": "{{is_vip}}" } } ] } }, { "type": "human_transfer", "id": "af5265b7-4a93-4bf5-a56a-7ccb0f3c0145", "name": "transfer_to_support", "phone_number": "+34{{local_number}}" }, { "type": "built_in", "id": "0d2fdfe8-0375-4283-bfe0-e6013ed1df59", "built_in": "current_datetime", "timezone": "Europe/{{timezone_city}}" } ] } ``` Validation rules: * URLs must resolve to a full `http://` or `https://` URL while preserving the configured origin. * URL placeholders are allowed only in path segments and query parameter values, and rejected in scheme, host, port, user info, fragments, and query parameter names. * `phone_number` must resolve to E.164, for example `+34600111222`. * `timezone` must resolve to an IANA timezone, for example `Europe/Madrid`. * Templated system argument values are scalar-only in v1. Arrays must remain literal constants. * `description`, `llm_result_instructions`, and pre-action message text are static configuration and do not support runtime personalization. Runtime `arguments` are bounded: * Maximum 32 keys * Maximum key length 64 characters * Maximum value length 1024 characters * Maximum combined value length 8192 characters * Template substitution only. Expressions and formulas are not supported. ## Template variables Use `{{variable_name}}` anywhere in the `system_prompt` or `greeting` to inject values at runtime. There are two layers: 1. **`template_defaults`**: Default values set on the agent itself. These apply to every call unless overridden. 2. **`arguments`**: Per-call overrides passed when [dispatching a call](/examples/agents-calls#calls-with-template-variables). ```json theme={null} // On the agent { "system_prompt": "You are an outreach coordinator for {{practice_name}}. You are calling {{patient_name}}.", "greeting": "Hi, is this {{patient_name}}?", "template_defaults": { "practice_name": "Greenfield Family Medicine", "patient_name": "there" } } ``` ```json theme={null} // When dispatching a call: overrides template_defaults { "phone_number": "+14155551234", "arguments": { "patient_name": "Maria" } } ``` In this example, `patient_name` resolves to "Maria" (from the call arguments) and `practice_name` resolves to "Greenfield Family Medicine" (from the agent defaults). The API response includes a `template_variables` field that lists every `{{variable}}` found in the prompt, along with whether it has a default value. This is useful for validating your templates before dispatching calls. Per-call `arguments` are limited to 32 keys, with keys up to 64 characters, values up to 1024 characters, and a combined value payload up to 8192 characters. ## Runtime variables Use `runtime_variables` when the model needs to capture call-scoped values during the conversation and reuse them later in tool configuration. These values: * exist only for the active call or web session * are set by the model through the built-in `set_runtime_variables` tool * can be referenced in webhook URLs and system tool template arguments using the same `{{variable_name}}` syntax ```json theme={null} { "runtime_variables": [ { "name": "order_id", "description": "Customer order identifier gathered during the call." }, { "name": "issue_type", "description": "Short label for the support issue." } ] } ``` ## Next steps Create, test, update, and delete agents with code. Outbound calls, template variables, and batch dispatch. # Speech-to-text HTTP examples Source: https://docs.slng.ai/examples/stt-http Code samples in curl, Python, and Node.js for basic transcription, word timestamps, and diarization. These examples use the Deepgram Nova model; see the [Speech-to-Text models](/models/stt) for other models and endpoints. ## Placeholders The snippets below use these placeholders. Replace them before running the code. | Placeholder | Replace with | | -------------- | --------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys) | ## Basic File Transcription Upload an audio file (MP3, WAV, FLAC, OGG, M4A, or WebM) and get back the transcribed text. Here is a sample file you can use: ```bash cURL theme={null} curl https://api.slng.ai/v1/stt/slng/deepgram/nova:3-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -F "audio=@micro-machines.wav" ``` ```javascript JavaScript theme={null} const formData = new FormData(); formData.append("audio", audioFile); // File from const response = await fetch( "https://api.slng.ai/v1/stt/slng/deepgram/nova:3-en", { method: "POST", headers: { Authorization: "Bearer SLNG_API_KEY", }, body: formData, }, ); const result = await response.json(); console.log("Transcription:", result.results.channels[0].alternatives[0].transcript); ``` ```python Python theme={null} import requests url = "https://api.slng.ai/v1/stt/slng/deepgram/nova:3-en" headers = {"Authorization": "Bearer SLNG_API_KEY"} with open("micro-machines.wav", "rb") as audio_file: response = requests.post(url, headers=headers, files={"audio": audio_file}) result = response.json() print("Transcription:", result["results"]["channels"][0]["alternatives"][0]["transcript"]) ``` You should get a response like this: ```json theme={null} { "metadata": { "request_id": "e5fed572-6eac-4b70-81f3-21ba1641dd12", "duration": 29.888374, "channels": 1, "model_info": { "1abfe86b-e047-4eed-858a-35e5625b41ee": { "name": "2-general-nova", "version": "2024-01-06.5664", "arch": "nova-2" } } }, "results": { "channels": [ { "alternatives": [ { "transcript": "is the micro machine man presenting the most midget miniature motocator of micro machine...", "confidence": 0.9823751, "words": [ { "word": "is", "start": 0.16, "end": 0.32, "confidence": 0.998 }, { "word": "the", "start": 0.32, "end": 0.40, "confidence": 0.613 }, { "word": "micro", "start": 0.40, "end": 0.64, "confidence": 0.727 }, ... ] } ] } ] } } ``` *** ## Going further You can pass additional form fields to customize the transcription: * **Language**: If you know the language, pass `language=en` (or `es`, `fr`, etc.). Not all models auto-detect, so setting this explicitly can improve accuracy. * **Diarization**: Pass `diarize=true` to identify different speakers in a multi-speaker recording. The response will include a `speaker` field on each word. PLease check the models documentation for specific diarization config. * **Punctuation**: Pass `punctuate=true` to add punctuation to the transcript automatically. For the full parameter list per model, see the [Speech-to-Text API reference](/api-reference/stt/deepgram-nova-3/nova-3-ws). *** ## Next Steps Try real-time speech recognition in your browser, no setup needed Real-time transcription as users speak Browse all STT models and endpoints Endpoint-specific parameters # Speech-to-text WebSocket examples Source: https://docs.slng.ai/examples/stt-websocket Code samples in Python and Node.js for basic transcription, word timestamps, and diarization. You need a working knowledge of the [WebSocket protocol](/websockets). These examples use the Deepgram Nova model; see the [Speech-to-Text models](/models/stt) for other models and endpoints. WebSockets let you transcribe in real-time as users speak and receive interim results for immediate feedback. If you only need to transcribe pre-recorded files, [HTTP is simpler](/examples/stt-http). ## Placeholders The snippets below use these placeholders. Replace them before running the code. | Placeholder | Replace with | | --------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys). The snippets read it from the `SLNG_API_KEY` environment variable. | | `recording.wav` | A local WAV or raw PCM audio file to transcribe | ## Message Flow Every STT WebSocket session follows this pattern: ```mermaid theme={null} sequenceDiagram participant Client participant SLNG Client->>SLNG: Connect wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en SLNG-->>Client: Connection open Client->>SLNG: { type: "init", config } SLNG-->>Client: { type: "ready", session_id: "..." } Client->>SLNG: binary audio data SLNG-->>Client: { type: "partial_transcript", transcript: "..." } Client->>SLNG: binary audio data SLNG-->>Client: { type: "final_transcript", transcript: "..." } Client->>SLNG: { type: "finalize" } Client->>SLNG: { type: "close" } ``` For the full list of message types and parameters, see the [WebSocket protocol reference](/websockets). *** ## Quick Start Connect, initialize a session, stream an audio file, and print the transcription. You need a WAV or raw PCM file to test with. Any short speech recording works. ```javascript JavaScript theme={null} // npm install ws const WebSocket = require("ws"); const fs = require("fs"); const API_KEY = process.env.SLNG_API_KEY; const AUDIO_FILE = process.argv[2] || "input.wav"; const ws = new WebSocket("wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en", { headers: { Authorization: `Bearer ${API_KEY}` }, }); ws.on("open", () => { // 1. Initialize session ws.send( JSON.stringify({ type: "init", config: { language: "en", sample_rate: 16000, encoding: "linear16", }, }), ); }); ws.on("message", (data) => { const message = JSON.parse(data.toString()); if (message.type === "ready") { console.log("Session ready:", message.session_id); // 2. Read and stream audio file in chunks const audio = fs.readFileSync(AUDIO_FILE); const CHUNK_SIZE = 4096; for (let i = 0; i < audio.length; i += CHUNK_SIZE) { ws.send(audio.slice(i, i + CHUNK_SIZE)); } // 3. Signal end of audio ws.send(JSON.stringify({ type: "close" })); } else if (message.type === "partial_transcript") { console.log("Interim:", message.transcript); } else if (message.type === "final_transcript") { console.log("Final:", message.transcript); } else if (message.type === "error") { console.error("Error:", message.message); ws.close(); } }); ws.on("close", () => { console.log("Connection closed"); }); ``` ```python Python theme={null} # pip install websockets import asyncio import json import os import sys import websockets CHUNK_SIZE = 4096 async def stt_quickstart(): api_key = os.environ["SLNG_API_KEY"] audio_file = sys.argv[1] if len(sys.argv) > 1 else "input.wav" uri = "wss://api.slng.ai/v1/stt/slng/deepgram/nova:3-en" headers = {"Authorization": f"Bearer {api_key}"} async with websockets.connect(uri, extra_headers=headers) as ws: # 1. Initialize session await ws.send(json.dumps({ "type": "init", "config": { "language": "en", "sample_rate": 16000, "encoding": "linear16", }, })) # Wait for ready before streaming audio ready = json.loads(await ws.recv()) print(f"Session ready: {ready['session_id']}") # 2. Read and stream audio file in chunks with open(audio_file, "rb") as f: while chunk := f.read(CHUNK_SIZE): await ws.send(chunk) # 3. Signal end of audio await ws.send(json.dumps({"type": "close"})) # 4. Receive transcription results async for message in ws: data = json.loads(message) if data["type"] == "partial_transcript": print(f"Interim: {data['transcript']}") elif data["type"] == "final_transcript": print(f"Final: {data['transcript']}") elif data["type"] == "error": print(f"Error: {data['message']}") break asyncio.run(stt_quickstart()) ``` Run with: ```bash JavaScript theme={null} node stt.js recording.wav ``` ```bash Python theme={null} python stt.py recording.wav ``` *** ## Going further The WebSocket protocol supports several options you can set in the `init` config or take advantage of in the response: * **Interim vs final transcripts**: Partial transcripts update in real-time as the user speaks. Final transcripts are confirmed segments that won't change. Use partials for live captions and finals for processing. * **Language**: Pass a `language` code in the init config for better accuracy. Not all models auto-detect. * **Endpointing**: Controls how quickly the API finalizes a transcript after silence. Useful for voice agents where you want fast turn-taking. * **Close vs finalize**: Send `{ "type": "close" }` when you are done to end the session. Use `{ "type": "finalize" }` to flush results mid-session without disconnecting. * **Keep-alive**: For long-running sessions with periods of silence, send `{ "type": "keepalive" }` periodically to prevent idle disconnection. For the full parameter list per model, see the [Speech-to-Text API reference](/api-reference/stt/deepgram-nova-3/nova-3-ws). *** ## Next Steps Try real-time speech recognition in your browser, no setup needed Simpler integration for pre-recorded files Full message types, parameters, and error codes Endpoint-specific parameters # Text-to-speech HTTP examples Source: https://docs.slng.ai/examples/tts-http Generate speech audio with the SLNG TTS HTTP API. Code samples in curl, Python, and Node.js for basic requests, voice selection, and streaming responses. Want to try TTS without setting up code? [Open the live demo](https://examples-gbcy.onrender.com) and test different models and voices in your browser. These examples use the Deepgram Aura model; swap the endpoint path to use a different provider. ## Placeholders The snippets below use these placeholders. Replace them before running the code. | Placeholder | Replace with | | -------------- | --------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys) | ## Basic Request Send text and receive a complete audio file in WAV format: ```bash cURL theme={null} curl https://api.slng.ai/v1/tts/slng/deepgram/aura:2-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "aura-2-thalia-en", "text": "Hello from sunny Barcelona!" }' \ --output hello.wav ``` ```javascript JavaScript theme={null} const response = await fetch( "https://api.slng.ai/v1/tts/slng/deepgram/aura:2-en", { method: "POST", headers: { Authorization: "Bearer SLNG_API_KEY", "Content-Type": "application/json", }, body: JSON.stringify({ model: "aura-2-thalia-en", text: "Hello from sunny Barcelona!", }), }, ); const audioData = await response.arrayBuffer(); // Play or save audioData ``` ```python Python theme={null} import requests url = "https://api.slng.ai/v1/tts/slng/deepgram/aura:2-en" headers = { "Authorization": "Bearer SLNG_API_KEY", "Content-Type": "application/json" } data = { "model": "aura-2-thalia-en", "text": "Hello from sunny Barcelona!", } response = requests.post(url, headers=headers, json=data) audio_data = response.content # Save to file with open("output.wav", "wb") as f: f.write(audio_data) ``` It will sound like this: *** ## More Examples ### With Voice Selection (Deepgram Aura) Beyond the audio generation, each model supports different parameters. For example, to specify a voice in Deepgram Aura: ```bash highlight={6} theme={null} curl https://api.slng.ai/v1/tts/slng/deepgram/aura:2-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Hello from sunny Barcelona!", "model": "aura-2-theia-en" }' \ --output hello-theia.wav ``` For the complete parameter reference, see the [Text-to-Speech](/api-reference/tts/deepgram-aura-2/aura-2-english-ws) API reference. ### With a Different Model Each model has its own endpoint and may use different parameters. For example, Rime Arcana uses `speaker` instead of `model`: ```bash highlight={1} theme={null} curl https://api.slng.ai/v1/tts/slng/rime/arcana:3-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Hello from Rime Arcana!", "speaker": "astra" }' \ --output hello-arcana.wav ``` For all available models, endpoints, and their parameters, see the [TTS models page](/models/tts). To control how brand names, acronyms, and domain-specific terms are spoken, add a [pronunciation dictionary](/pronunciation-dictionaries) to your TTS request. For voice catalogs with audio samples, browse the [Voices](/voices/deepgram-aura) section in the sidebar. *** ## Next Steps Try TTS in your browser, no setup needed Stream audio in real time for voice agents Browse all TTS models and endpoints Reuse pronunciation rules across TTS requests Endpoint-specific parameters # Text-to-speech WebSocket examples Source: https://docs.slng.ai/examples/tts-websocket Stream speech with the SLNG TTS WebSocket API. Python and Node.js code samples for sub-100ms latency, mid-sentence interrupt, and continuous text input. You need a working knowledge of the [WebSocket protocol](/websockets). These examples use the Deepgram Aura model; swap the endpoint path to use a different provider. WebSockets let you stream audio with sub-100ms latency and stop mid-sentence, which is what you need for voice agent conversations. If you only need to generate audio files, [HTTP is simpler](/examples/tts-http). ## Placeholders The snippets below use these placeholders. Replace them before running the code. | Placeholder | Replace with | | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys). The snippets read it from the `SLNG_API_KEY` environment variable. | ## Message Flow Every TTS WebSocket session follows this pattern: ```mermaid theme={null} sequenceDiagram participant Client participant SLNG Client->>SLNG: Connect wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en SLNG-->>Client: Connection open Client->>SLNG: { type: "init", model, voice, config } SLNG-->>Client: { type: "ready", session_id: "..." } Client->>SLNG: { type: "text", text: "Hello!" } SLNG-->>Client: audio chunk SLNG-->>Client: audio chunk Client->>SLNG: { type: "flush" } SLNG-->>Client: audio chunk SLNG-->>Client: { type: "flushed" } Client->>SLNG: { type: "close" } ``` For the full list of message types and parameters, see the [WebSocket protocol reference](/websockets). To keep acronyms, names, and domain terms consistent during synthesis, set a session default or per-turn [pronunciation dictionary](/pronunciation-dictionaries). *** ## Quick Start Connect, initialize a session, send text, and save the audio to a file. Both examples write a `output.pcm` file containing raw 16-bit PCM audio at 24 kHz. You can play it with [ffplay](https://ffmpeg.org/ffplay.html): ```bash theme={null} ffplay -f s16le -ar 24000 -ac 1 output.pcm ``` ```javascript JavaScript theme={null} // npm install ws const WebSocket = require("ws"); const fs = require("fs"); const API_KEY = process.env.SLNG_API_KEY; const ws = new WebSocket("wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en", { headers: { Authorization: `Bearer ${API_KEY}` }, }); const audioChunks = []; ws.on("open", () => { // 1. Initialize session ws.send( JSON.stringify({ type: "init", model: "aura:2-en", voice: "aura-2-thalia-en", config: { encoding: "linear16", sample_rate: 24000, }, }), ); // 2. Send text to convert ws.send( JSON.stringify({ type: "text", text: "Hello! This is a WebSocket TTS example.", }), ); // 3. Flush to get remaining audio ws.send(JSON.stringify({ type: "flush" })); }); ws.on("message", (data, isBinary) => { if (isBinary) { audioChunks.push(data); } else { const message = JSON.parse(data.toString()); if (message.type === "ready") { console.log("Session ready:", message.session_id); } else if (message.type === "flushed") { console.log("All audio received, saving to output.pcm"); fs.writeFileSync("output.pcm", Buffer.concat(audioChunks)); ws.close(); } else if (message.type === "error") { console.error("Error:", message.message); ws.close(); } } }); ``` ```python Python theme={null} # pip install websockets import asyncio import json import os import websockets async def tts_quickstart(): api_key = os.environ["SLNG_API_KEY"] uri = "wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en" headers = {"Authorization": f"Bearer {api_key}"} async with websockets.connect(uri, extra_headers=headers) as ws: # 1. Initialize session await ws.send(json.dumps({ "type": "init", "model": "aura:2-en", "voice": "aura-2-thalia-en", "config": { "encoding": "linear16", "sample_rate": 24000, }, })) # 2. Send text to convert await ws.send(json.dumps({ "type": "text", "text": "Hello! This is a WebSocket TTS example.", })) # 3. Flush to get remaining audio await ws.send(json.dumps({"type": "flush"})) # 4. Receive audio chunks until flushed audio_chunks = [] async for message in ws: if isinstance(message, bytes): audio_chunks.append(message) else: data = json.loads(message) if data["type"] == "ready": print(f"Session ready: {data['session_id']}") elif data["type"] == "flushed": print("All audio received, saving to output.pcm") break elif data["type"] == "error": print(f"Error: {data['message']}") break with open("output.pcm", "wb") as f: for chunk in audio_chunks: f.write(chunk) asyncio.run(tts_quickstart()) ``` *** ## More Examples ### Batch Text Streaming Send multiple sentences for smoother speech instead of one large block. This is a complete example you can run independently. ```javascript JavaScript theme={null} // npm install ws const WebSocket = require("ws"); const fs = require("fs"); const API_KEY = process.env.SLNG_API_KEY; const ws = new WebSocket("wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en", { headers: { Authorization: `Bearer ${API_KEY}` }, }); const text = "The sun rose over the mountains. Birds began to sing in the trees. A gentle breeze carried the scent of wildflowers across the valley."; const audioChunks = []; ws.on("open", () => { ws.send( JSON.stringify({ type: "init", model: "aura:2-en", voice: "aura-2-thalia-en", config: { encoding: "linear16", sample_rate: 24000 }, }), ); // Stream sentences one at a time const sentences = text.split(". "); for (const sentence of sentences) { ws.send( JSON.stringify({ type: "text", text: sentence.endsWith(".") ? sentence : sentence + ".", }), ); } ws.send(JSON.stringify({ type: "flush" })); }); ws.on("message", (data, isBinary) => { if (isBinary) { audioChunks.push(data); } else { const message = JSON.parse(data.toString()); if (message.type === "flushed") { console.log("All audio received, saving to output.pcm"); fs.writeFileSync("output.pcm", Buffer.concat(audioChunks)); ws.close(); } } }); ``` ```python Python theme={null} # pip install websockets import asyncio import json import os import websockets async def tts_batch_streaming(): api_key = os.environ["SLNG_API_KEY"] uri = "wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en" headers = {"Authorization": f"Bearer {api_key}"} text = ( "The sun rose over the mountains. " "Birds began to sing in the trees. " "A gentle breeze carried the scent of wildflowers across the valley." ) async with websockets.connect(uri, extra_headers=headers) as ws: await ws.send(json.dumps({ "type": "init", "model": "aura:2-en", "voice": "aura-2-thalia-en", "config": {"encoding": "linear16", "sample_rate": 24000}, })) # Stream sentences one at a time sentences = text.split(". ") for sentence in sentences: if not sentence.endswith("."): sentence += "." await ws.send(json.dumps({ "type": "text", "text": sentence, })) await ws.send(json.dumps({"type": "flush"})) # Receive audio chunks until flushed audio_chunks = [] async for message in ws: if isinstance(message, bytes): audio_chunks.append(message) else: data = json.loads(message) if data["type"] == "flushed": print("All audio received, saving to output.pcm") break with open("output.pcm", "wb") as f: for chunk in audio_chunks: f.write(chunk) asyncio.run(tts_batch_streaming()) ``` *** ## Next Steps Try real-time TTS in your browser, no setup needed Full message types, parameters, and error codes Reuse pronunciation rules across TTS requests Simpler integration for non-streaming use cases Endpoint-specific parameters # Adaptive Execution Source: https://docs.slng.ai/execution-layer/adaptive Not every turn deserves the same execution path. The execution layer adapts on four dimensions, from geography and compliance to cost, latency, and workload. Not every turn deserves the same execution path. The layer adapts on four dimensions. ## Geography Route to the nearest region. Data stays in-jurisdiction. A caller in Frankfurt hits EU infrastructure. A caller in Mumbai hits AP South. No configuration needed: this is the default behavior. Override with `X-Region-Override` when you need a specific datacenter, or `X-World-Part-Override` when you need to stay within a geographic zone. See [Regional Execution](/region-override). ## Compliance Execution constraints determine where data and models can operate. Healthcare, banking, and insurance workloads have requirements about where audio can be processed and where transcripts can exist. Regional execution respects these constraints per request. ## Cost and latency Cached path, local inference, or full reasoning, based on the input. The layer makes this decision per turn: * A greeting that has been said a thousand times: cached path. * A simple acknowledgment: local inference, no LLM call. * A complex question requiring reasoning: full inference path. Same API call from your side. The path is selected for you. ## Workload Adapt based on model availability, caller context, and interaction characteristics. If a primary model shows elevated latency, traffic can route to a fallback. If a caller's language is better served by a specific STT model, the routing adapts. ## The result A system where cost and latency decrease with usage, while reliability increases. Every call improves routing decisions and cache coverage for the next one. Regulated industries (healthcare, banking, insurance) see the strongest compounding effect. Their workflows have high repetition within each customer's use cases, and the layer learns those patterns. # Bring your own key Source: https://docs.slng.ai/execution-layer/byok Pass your own provider key on requests so billing runs against your provider account, while the SLNG cache still applies on top. If you already have a contract with a text-to-speech provider, you can keep it. Bring Your Own Key (BYOK) lets you pass your own provider key on requests. Your key is forwarded upstream, so billing and rate limits run against your provider account, and [output assembly](/execution-layer/output-assembly) still applies on top. ## Placeholders The snippets below use these placeholders. Replace them before running the code. | Placeholder | Replace with | | ------------------- | --------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys) | | `YOUR_PROVIDER_KEY` | A provider key issued by the upstream TTS provider you are calling | ## How output assembly applies Assembly runs before your request is forwarded, so a cache hit never reaches the upstream provider: * **Cache hit**: the cached audio is returned, no provider call, and no billing event. * **Cache miss**: the request is sent upstream with your key, the provider bills your account, and the response is cached for future requests. ## Supported providers | Provider | Model | | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Cartesia | [Sonic 3](/api-reference/tts/cartesia-sonic-3/cartesia-sonic-3-ws) | | Deepgram | [Aura 2](/api-reference/tts/deepgram-aura-2/aura-2-english-ws) | | Kugel | [Kugel 1 Turbo](/api-reference/tts/kugel-1-turbo/kugel-1-turbo-ws), [Kugel 1](/api-reference/tts/kugel-1/kugel-1-ws), [Kugel 2](/api-reference/tts/kugel-2/kugel-2-ws) | | Murf | [Falcon](/api-reference/tts/murf-falcon/murf-falcon-ws) | | Sarvam | [Bulbul](/api-reference/tts/sarvam-ai-bulbul-v3/bulbul-v3-ws) | | Soniox | [TTS RT v1](/api-reference/tts/soniox-tts-v1/soniox-tts-v1-ws) | ## Send a BYOK request Add the `X-Slng-Provider-Key` header alongside your standard SLNG `Authorization` header. ### HTTP ```bash highlight={3} theme={null} curl -X POST "https://api.slng.ai/v1/tts/sarvam/bulbul:v3" \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "X-Slng-Provider-Key: YOUR_PROVIDER_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "I brought my own key to the party.", "target_language_code": "en-IN", "speaker": "shubh", "model": "bulbul:v3" }' ``` ### WebSocket Set `X-Slng-Provider-Key` as a header on the WebSocket upgrade request. The message flow after the upgrade is unchanged. ```http theme={null} X-Slng-Provider-Key: YOUR_PROVIDER_KEY ``` The browser WebSocket API does not support custom headers. Set the provider key from a server-side WebSocket client. ## Error handling ### HTTP If the upstream provider rejects your key, the provider's error response is returned with this header: ```http theme={null} X-Slng-Auth-Source: client_key ``` That header tells you the failure came from your provider key, not from SLNG. ### WebSocket Auth failures surface as a WebSocket error frame after the upgrade is accepted, for example `backend_connection_failed`, with the upstream 401 or 403 detail included. ## Billing The upstream provider bills your account directly for BYOK requests. No audio-minute fees apply to BYOK traffic. ## Next steps How assembled-from-cache output cuts cost and latency on repeated text. Route your existing stack through SLNG. # How It Works Source: https://docs.slng.ai/execution-layer/how-it-works SLNG runs three stages between your orchestrator and the models, one for each part of the voice pipeline. Each removes a category of unnecessary compute. SLNG runs three stages between your orchestrator and the models you use, one for each part of the voice pipeline. Each removes a category of unnecessary compute. ## Request lifecycle of a single turn When a caller speaks: 1. **Audio arrives** from the caller via your orchestrator. 2. **STT Routing** selects the transcription model based on language, accent, noise profile, and cost constraints. 3. **Transcript** goes to your orchestrator's LLM, or to SLNG's tiered decisioning. 4. **Tiered Decisioning** determines whether the turn needs full inference, local inference, or can be resolved without calling the LLM. 5. **The response** (LLM, cached, or deterministic) is sent to TTS. 6. **Output Assembly** checks cache, assembles from segments where possible, and generates only what is new. 7. **Audio returns** to the caller. These stages do not have to run in strict sequence. Assembly can begin before the LLM has finished its full response. ## Three stages ### STT Routing `PRIVATE BETA` The first stage. When audio arrives, it is routed to the right STT model for that specific interaction. A Hindi caller in Mumbai gets a different model than an English caller in New York. A noisy environment may route to a model with better noise handling. The layer balances accuracy, latency, and cost per turn. See [STT Routing](/execution-layer/stt-routing) for details. ### Tiered Decisioning The second stage. Not every turn needs full LLM inference. A consent disclosure repeated for the hundredth time does not need a reasoning model. It needs a deterministic, low-latency response. Intelligence is allocated where it is needed and skipped where it is not. See [Tiered Decisioning](/execution-layer/tiered-decisioning) for details. ### Output Assembly The third stage. TTS on the execution layer is assembly, not generation. When a turn produces text for speech, the layer checks whether this output, or parts of it, has been assembled before. A miss is synthesized, cached, and served from cache next time. See [Output Assembly](/execution-layer/output-assembly) for details. ## Regional execution All three stages run across a global edge network. By default, requests route to the lowest-latency region. You can pin to specific regions for compliance or performance. See [Regional Execution](/region-override) for details. ## Continuous improvement As more calls flow through the layer, routing decisions sharpen, cache coverage grows, and tier allocation becomes more accurate. Every call makes the next one more efficient. # The Execution Layer Source: https://docs.slng.ai/execution-layer/index The execution layer gives you control of your voice calls. It reduces latency and model cost across the pipeline, using your existing tools, while improving reliability. This layer is built to give you control of your voice calls. It reduces latency and model cost across the voice pipeline, using your existing tools, while improving reliability. ## The problem A 16-turn voice call makes 48 model calls: STT, LLM, and TTS on every turn. Without an execution layer, each of those 48 runs from scratch. At 1M calls per month, that is 48M inference calls, every one generated fresh regardless of whether the output has been produced before. The same consent disclosure. The same hold message. The same greeting. Generated from scratch, every time. ## What the execution layer changes Every turn is routed through the execution path it actually needs. Three stages, one for each part of the voice pipeline: | Stage | What it does | | ---------------------- | --------------------------------------------------------------------------------------------- | | **STT Routing** | Route input to the right transcription model, based on language, accent, noise, and cost. | | **Tiered Decisioning** | Determine whether the turn needs full LLM reasoning, local inference, or no inference at all. | | **Output Assembly** | Assemble TTS output from cache and synthesis. Don't generate what already exists. | These stages run across a global network of edge points, with regional execution that keeps data in-jurisdiction. ## The system improves under load Every call through the system improves routing decisions and cache coverage for the next one. Cost and latency decrease with usage. Reliability increases. * More calls, more cache coverage, fewer model calls, lower cost * More patterns observed, better routing decisions, lower latency * More providers configured, more failover options, higher reliability ## What customers see | Metric | Improvement | | --------------------------- | --------------------------- | | End-to-end latency per turn | Up to 48% reduction | | Total pipeline cost | Up to 57% reduction | | Call completion rate | Zero dropped, zero downtime | ## How to integrate SLNG works with the orchestrator and models you already use. The endpoints sit between your orchestrator and your providers. Architecture and the request lifecycle. How path selection works. Connect your orchestrator. # Output Assembly Source: https://docs.slng.ai/execution-layer/output-assembly TTS on the execution layer is assembly, not generation. Output is served from cache when possible and synthesized only when genuinely new. TTS on the execution layer is assembly, not generation. When a turn produces text for speech, it is not blindly forwarded to a TTS model. The layer determines the way to produce the audio: serving from cache when possible, generating only what is genuinely new. The goal is to not generate what already exists. ## The principle In a traditional voice pipeline, every TTS request is a fresh synthesis call. The same greeting, the same compliance disclosure, the same hold message, generated from scratch every time, billed every time. Output assembly changes this. When audio has been produced before for the same request, it is served instantly, with no upstream model call and no provider billing. When it has not, it is generated, and the result is available for future requests. ## How it improves over time Output assembly is not a static cache. It improves as call volume grows: * **Coverage expands with usage.** Every new phrase your agents speak adds to the cache. * **Repetition compounds.** Voice agents naturally repeat greetings, confirmations, disclosures, hold messages, and error handling. * **Industry patterns accelerate improvement.** Regulated industries see the fastest improvement because their workflows have high repetition. * **Cost decreases structurally.** As coverage grows, fewer turns hit the upstream model. ## What this means | Without output assembly | With output assembly | | ------------------------------------------- | ------------------------------------------------- | | Every TTS request is a fresh synthesis call | Only novel text triggers synthesis | | Cost is constant per turn | Cost per turn decreases as coverage grows | | Latency depends on the model every time | Cached responses return from the edge | | Switching TTS models means starting over | Pronunciation rules carry over across model swaps | ## Cache isolation Every cache entry is scoped and isolated by: * **Region**: cache is local to the region where the request executes. * **Customer**: your cache is yours. No data is shared across SLNG customers. * **Use case**: different agents, voices, and configurations maintain separate cache namespaces. Your audio, your data, your cache, fully isolated. ## Pronunciation dictionaries Before text reaches any TTS model, pronunciation rewrite rules normalize brand names, acronyms, and domain terms, regardless of which model synthesizes the audio. Your rules carry over when you swap TTS models. See [Pronunciation Dictionaries](/pronunciation-dictionaries) for the full API. ## BYOK (Bring Your Own Key) With [BYOK](/execution-layer/byok), you pass your own provider key on TTS requests. Output assembly still applies: a cache hit never reaches the provider, so you are not billed; a miss uses your key for synthesis and caches the result. Your existing provider contracts and volume discounts stay intact. ## Configuration Output assembly is automatic. Every TTS request goes through it. You shape the behavior through: * **Model and voice selection**: determines the cache namespace. * **Pronunciation dictionaries**: control text rewriting before synthesis. * **BYOK**: use your own provider keys with the same assembly logic. * **Region**: determines which edge cache is consulted first. Timeout settings for the TTS stage: | Setting | Default | Description | | --------------------------- | ------- | --------------------------------------------------------------- | | `tts_first_audio_timeout_s` | 4.0 | Maximum wait for first audio from the TTS model before failover | | `failure_audio_enabled` | true | Play a failure audio clip if all TTS paths fail | # STT Routing Source: https://docs.slng.ai/execution-layer/stt-routing The first stage of the execution layer routes each turn's audio to the transcription model best suited to it, balancing accuracy, latency, and cost. STT Routing is in `PRIVATE BETA`. The behavior described here is being rolled out gradually. [Contact us](mailto:support@slng.ai) for access. The first stage of the execution layer. When audio arrives, it is routed to the right STT model for that specific interaction. A Hindi caller in Mumbai gets a different model than an English caller in New York. A noisy environment may route to a model with better noise handling. The layer balances accuracy, latency, and cost per turn. ## How it routes Routing weighs several inputs together, not in isolation: * **Language and accent** of the audio * **Noise profile** of the environment * **Regional availability** of models * **Cost and latency** constraints For voice agent calls, routing happens per turn. Each turn can route to the model best suited to that specific audio segment. ## Today Until STT Routing is generally available, you select the STT model explicitly on each request. See the [Speech-to-Text Overview](/stt/overview) for the current model list and how to choose. ## Related The models available today and how to pick one. Where STT routing sits in the pipeline. # Tiered Decisioning Source: https://docs.slng.ai/execution-layer/tiered-decisioning Not every turn in a voice call needs full LLM inference. SLNG allocates reasoning where it's needed and resolves repeatable turns through shorter paths. The LLM-tiering behavior on this page is being rolled out and verified. [Contact us](mailto:support@slng.ai) to discuss availability for your account. Not every turn in a voice call needs full LLM inference. A consent disclosure repeated for the hundredth time. A hold message. A standard greeting. An acknowledgment like "got it, one moment." These do not need a reasoning model. They need a deterministic, low-latency response. Tiered decisioning allocates intelligence where it is needed and skips it where it is not. ## Adaptive execution paths In a traditional voice pipeline, every turn takes the same path: audio in, STT, LLM, TTS, audio out. The same pipeline and the same cost, whether the caller asked a complex question or said "yes." The execution layer constructs the path for each turn based on the full context of the conversation: what was said, what has been said before, how the call is progressing, and what the turn actually requires. The decision is not just which model to call, but whether to call a model at all, which stages to invoke, and how to assemble the response. Same API call from your side. The path is constructed for you. ## Paths adapt in real time Within a single call, the path can change turn by turn. The first turn might need full reasoning to understand intent. The next few might resolve through shorter paths as the conversation enters a familiar pattern. Then a turn introduces new complexity and the path expands again. The decision factors are compound (language, intent, similarity to prior turns, availability of cached responses, model load, regional constraints) and they are evaluated together. ## How it improves over time The layer learns from every call: * **Path intelligence sharpens with volume.** Early on, more turns take the full path as a safe default. Over time, a growing proportion route through faster, cheaper paths. * **Your call patterns shape your optimization.** A healthcare scheduling agent develops different path intelligence than a financial-services collections agent. * **Improvement is continuous.** You do not retrain anything. The layer observes outcomes and the next call benefits. ## What this means | Without tiered decisioning | With tiered decisioning | | ------------------------------------------------- | --------------------------------------------------------- | | Every turn hits the LLM | Only turns that need reasoning hit the LLM | | Latency is constant regardless of turn complexity | Simple turns resolve faster through a shorter path | | Cost scales linearly with call volume | Cost per call decreases as the layer learns your patterns | ## Configuration LLM model selection and failover are configured on the voice agent: ```json theme={null} { "models": { "llm": "groq/moonshotai/kimi-k2-instruct-0905", "fallbacks": { "llm": ["groq/moonshotai/kimi-k2-instruct-0905"] }, "llm_first_token_timeout_s": 6.0 } } ``` | Setting | Default | Description | | --------------------------- | ------- | ------------------------------------------------------------- | | `llm_first_token_timeout_s` | 6.0 | Maximum wait for the first token from the LLM before failover | See [Configuration & Tools](/examples/agents-config#models) for the full model configuration reference. # SLNG Unified API Source: https://docs.slng.ai/execution-layer/unified-api The SLNG Unified API gives you one request format for every STT and TTS model. Swap Deepgram, Rime, Cartesia, and more by changing the URL. One endpoint pattern for every STT and TTS model on the platform. Switch providers by changing the URL path. Your code, auth, and request format don't change. ## Swap models by changing the URL Every request uses the same base URL. The only part that changes is the model path: | | Base URL | Model path | | ----------------------------- | ------------------------------------ | ----------------------------- | | Deepgram Nova 3 (proxied) | `api.slng.ai/v1/bridges/unmute/stt/` | **`deepgram/nova:3`** | | Deepgram Nova 3 (SLNG-hosted) | `api.slng.ai/v1/bridges/unmute/stt/` | **`slng/deepgram/nova:3-en`** | | Deepgram Aura 2 | `api.slng.ai/v1/bridges/unmute/tts/` | **`slng/deepgram/aura:2-en`** | | Rime Arcana v3 | `api.slng.ai/v1/bridges/unmute/tts/` | **`slng/rime/arcana:3-en`** | Authentication, request body, and response format are identical. Only the path differs. ## Why a unified interface Most voice AI stacks lock you into one provider's SDK, request format, and response structure. The Unified API removes that friction. | Without SLNG Unified API | With SLNG Unified API | | --------------------------------------- | ---------------------------------------------- | | One SDK per provider | One HTTP/WebSocket endpoint for all | | Different request schemas | Same request body across providers | | Provider-specific error handling | Consistent error responses | | Weeks to evaluate a new model | Change the URL path, test immediately | | Manual upgrades when better models ship | New models available as soon as SLNG adds them | Common patterns: * **A/B testing**: route a percentage of traffic to a different model path with no code change. * **Failover**: switch providers by changing the model path; request format and error handling stay the same. * **Latency optimization**: deploy across regions and route each one to the lowest-latency model. * **Rapid prototyping**: try a new model by changing the URL. The Unified API doesn't hide provider capabilities. Pass voice identifiers, sample rates, encodings, language codes, and speed settings through the `config` object. See [Parameters coverage](/execution-layer/unified-api-parameters) for the full list. ## Works over HTTP and WebSocket The same model path works with both protocols: | Protocol | URL | | --------- | ------------------------------------------------------------------- | | HTTP | `https://api.slng.ai/v1/bridges/unmute/tts/slng/deepgram/aura:2-en` | | WebSocket | `wss://api.slng.ai/v1/bridges/unmute/tts/slng/deepgram/aura:2-en` | Use HTTP for batch jobs and file conversion. Use WebSocket for real-time streaming and voice agents. See [HTTP vs. WebSocket](/protocols) for details. ## Get started ### Prerequisites * An SLNG key ([get one here](https://app.slng.ai/api-keys)) * `curl` installed (or any HTTP client) ### Authentication All requests require a Bearer token: ```bash theme={null} Authorization: Bearer SLNG_API_KEY ``` ### Text-to-Speech Generate speech from text. Here's a request using Rime Arcana v3: ```bash theme={null} curl -X POST https://api.slng.ai/v1/bridges/unmute/tts/slng/rime/arcana:3-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "voice": "luna", "text": "Hello from the SLNG Unified API!" }' \ --output hello.wav ``` This saves a WAV file. You can set encoding and sample rate through the `config` object. To make acronyms, product names, and domain terms speak consistently across TTS models, attach a [pronunciation dictionary](/pronunciation-dictionaries) to the request. The model is inferred from the URL path. Do not include a duplicate `model` field in the request body unless an endpoint reference explicitly requires it. Switch to Deepgram Aura 2 by changing the URL path and adapting the voice. ```bash highlight={1} theme={null} curl -X POST https://api.slng.ai/v1/bridges/unmute/tts/slng/deepgram/aura:2-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "voice": "aura-2-asteria-en", "text": "Hello from the SLNG Unified API!" }' \ --output hello.wav ``` ### Speech-to-Text Transcribe audio with SLNG-hosted Deepgram Nova 3: ```bash theme={null} curl -X POST https://api.slng.ai/v1/bridges/unmute/stt/slng/deepgram/nova:3-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -F "audio=@recording.wav" \ -F "language=en" ``` Switch to proxied Deepgram Nova 3 for 17-language coverage. Only the URL changes: ```bash highlight={1} theme={null} curl -X POST https://api.slng.ai/v1/bridges/unmute/stt/deepgram/nova:3 \ -H "Authorization: Bearer SLNG_API_KEY" \ -F "audio=@recording.wav" \ -F "language=en" ``` ### WebSocket streaming The same model paths work over WebSocket for real-time streaming. Connect to `wss://` instead of posting to `https://`. The browser WebSocket API does not support custom headers. Pass the SLNG key as a query parameter or use a server-side WebSocket client. The example below uses the Node.js `ws` library. ```javascript theme={null} import WebSocket from "ws"; const ws = new WebSocket( "wss://api.slng.ai/v1/bridges/unmute/tts/slng/deepgram/aura:2-en", { headers: { Authorization: "Bearer SLNG_API_KEY" } } ); ws.on("open", () => { ws.send(JSON.stringify({ voice: "aura-2-asteria-en", text: "Streaming audio in real time." })); }); ws.on("message", (data) => { // Binary frames contain audio chunks if (Buffer.isBuffer(data)) { process.stdout.write(data); } }); ``` For browser-based WebSocket examples, see [TTS over WebSocket](/examples/tts-websocket). ## Next steps See which parameters each provider supports. Browse all models available through the Unified API. When to use each protocol and their trade-offs. # Supported models Source: https://docs.slng.ai/execution-layer/unified-api-models Full list of TTS and STT models reachable through the SLNG Unified API. Deepgram Nova and Aura, Rime Arcana, ElevenLabs, Cartesia, Sarvam, and more. Every model on the SLNG platform works through the Unified API. Some run on SLNG infrastructure (SLNG-hosted), others are proxied to the provider's own API. SLNG-hosted models use the `slng/` prefix in the model identifier. They are optimized for low latency in specific regions and billed through your SLNG account with no separate provider account needed. ## Text-to-Speech **Provider:** Rime | **Hosting:** SLNG | **Languages:** English, Hindi Natural prosody with improved voice quality over v1. French and Spanish support planned. | Identifier | Description | | ----------------------- | ----------- | | `slng/rime/arcana:3-en` | English | | `slng/rime/arcana:3-hi` | Hindi | Best for: conversational agents, high-quality voice output. **Provider:** Rime | **Hosting:** SLNG | **Languages:** Arabic, English, French, German, Spanish 150+ voices across 5 languages. Uses unversioned identifiers (`arcana:` instead of `arcana:3-`). | Identifier | Description | | --------------------- | ----------- | | `slng/rime/arcana:en` | English | | `slng/rime/arcana:es` | Spanish | | `slng/rime/arcana:fr` | French | | `slng/rime/arcana:de` | German | | `slng/rime/arcana:ar` | Arabic | Best for: broad language coverage, high-volume production. **Provider:** Deepgram | **Hosting:** SLNG + Deepgram | **Languages:** English, Spanish Available both SLNG-hosted (low latency) and proxied to Deepgram (more languages). The `model` field is required to pick a voice; there is no default. | Identifier | Description | | ------------------------- | --------------------------------- | | `slng/deepgram/aura:2-en` | SLNG-hosted (English) | | `slng/deepgram/aura:2-es` | SLNG-hosted (Spanish) | | `deepgram/aura:2` | Proxied to Deepgram (7 languages) | Best for: low-latency voice apps, multilingual IVR. **Provider:** Cartesia | **Hosting:** Cartesia | **Languages:** 40+ Streaming TTS with controls for speed, volume, and emotion. | Identifier | Description | | ------------------ | ------------------- | | `cartesia/sonic:3` | Proxied to Cartesia | Best for: real-time voice agents, expressive multilingual output. **Provider:** Murf | **Hosting:** Murf | **Languages:** 16 locales Real-time multilingual TTS over WebSocket with multiple encodings and sample rates. | Identifier | Description | | --------------------- | --------------- | | `murf/murftts:falcon` | Proxied to Murf | Best for: low-cost voice agents, IVR, assistants. **Provider:** KugelAudio | **Hosting:** KugelAudio | **Languages:** 26 87 voices with expressiveness control across 26 languages. Three variants trade off latency and quality. | Identifier | Description | | -------------------------- | ------------------------------ | | `kugelaudio/kugel:1-turbo` | Lowest latency | | `kugelaudio/kugel:1` | Higher quality, higher latency | | `kugelaudio/kugel:2` | Latest version, EU region | Best for: multilingual voice agents, content creation. **Provider:** Soniox | **Hosting:** Soniox | **Languages:** 50+ Streaming and one-shot synthesis across 50+ languages. | Identifier | Description | | ------------------ | ----------------- | | `soniox/tts-rt:v1` | Proxied to Soniox | Best for: multilingual streaming, long-form synthesis. **Provider:** Sarvam AI | **Hosting:** Sarvam AI | **Languages:** 11 Indian languages Purpose-built for Indian languages with 30+ speaker voices. | Identifier | Description | | ------------------ | -------------------- | | `sarvam/bulbul:v3` | Proxied to Sarvam AI | Best for: India-focused voice apps, regional language support. ## Speech-to-Text **Provider:** Deepgram | **Hosting:** SLNG + Deepgram | **Languages:** English, Spanish, Hindi Available both SLNG-hosted and proxied. | Identifier | Description | | ---------------------------- | ------------------------------------- | | `slng/deepgram/nova:3-en` | SLNG-hosted (English) | | `slng/deepgram/nova:3-es` | SLNG-hosted (Spanish) | | `slng/deepgram/nova:3-hi` | SLNG-hosted (Hindi) | | `slng/deepgram/nova:3-multi` | SLNG-hosted (auto-detect) | | `deepgram/nova:3` | Proxied to Deepgram (17 languages) | | `deepgram/nova:3-medical` | Proxied to Deepgram (medical English) | Best for: low-latency streaming, voice agents. **Provider:** Deepgram | **Hosting:** Deepgram | **Languages:** 36 Proven model with VAD, diarization, and smart formatting. | Identifier | Description | | ----------------- | ------------------- | | `deepgram/nova:2` | Proxied to Deepgram | Best for: production transcription, multi-speaker audio. **Provider:** Soniox | **Hosting:** Soniox | **Languages:** 60+ Real-time STT with speaker diarization, automatic language identification, and configurable endpoint detection. Returns native Soniox token frames. | Identifier | Description | | ------------------------ | ----------------- | | `soniox/speech-ai:rt-v4` | Proxied to Soniox | Best for: multilingual streaming, multi-speaker conversations. **Provider:** Reson8 | **Hosting:** Reson8 | **Languages:** 9 European Real-time STT with word-level timestamps, confidence scores, and partial results. Dutch, English, French, German, Italian, Polish, Portuguese, Spanish, Swedish. | Identifier | Description | | --------------------- | ----------------- | | `reson8/reson8stt:v1` | Proxied to Reson8 | Best for: European-language voice agents, low-latency transcription. **Provider:** Sarvam AI | **Hosting:** Sarvam AI | **Languages:** 24 Speech recognition for 24 Indian and European languages with code-mixing support. HTTP-only; not available over WebSocket. | Identifier | Description | | ------------------ | -------------------- | | `sarvam/saaras:v3` | Proxied to Sarvam AI | Best for: India-focused batch transcription, code-switched audio. ## Choosing a model | Use case | Recommended STT | Recommended TTS | | -------------------------- | ------------------------- | ----------------------- | | Voice agents (low latency) | `slng/deepgram/nova:3-en` | `slng/rime/arcana:3-en` | | Multilingual transcription | `deepgram/nova:3` | `cartesia/sonic:3` | | Indian languages | `slng/deepgram/nova:3-hi` | `sarvam/bulbul:v3` | | Batch transcription | `deepgram/nova:3` | — | | Medical transcription | `deepgram/nova:3-medical` | — | For voices, regions, and detailed model specs, see the [Model Catalog](/models). # Parameters coverage Source: https://docs.slng.ai/execution-layer/unified-api-parameters Compatibility matrix for the SLNG Unified API. Which TTS and STT request parameters are supported by Deepgram, ElevenLabs, Cartesia, Sarvam, and Rime. The SLNG Unified API accepts a common set of parameters across all providers. Some parameters are universal, others depend on the model. Provider-specific values (like voice identifiers or language codes) are passed through the same fields. The platform forwards them to the underlying provider. ## TTS parameters | Parameter | Type | Required | Description | | -------------------- | ------- | -------- | ------------------------------------------------------------------ | | `text` | string | Yes | Text to synthesize | | `voice` | string | No | Voice identifier. Accepted values depend on the provider and model | | `config.sample_rate` | integer | No | Output sample rate in Hz | | `config.encoding` | string | No | Output encoding format | | `config.language` | string | No | Language code. Accepted values depend on the provider | | `config.speed` | number | No | Speech speed multiplier | ### TTS provider support The matrix below covers the most commonly used SLNG-hosted models. Providers not listed here (Cartesia Sonic 3, Murf Falcon, Kugel, Soniox TTS v1) accept the same `text` and `voice` fields. Check each model's [API reference](/api-reference) for the exact list of supported `config.*` fields. | Parameter | Rime Arcana | Deepgram Aura 2 | Sarvam Bulbul | | -------------------- | ----------- | --------------- | ------------- | | `voice` | Yes | Yes | Yes | | `config.sample_rate` | Yes | Yes | Yes | | `config.encoding` | Yes | Yes | Yes | | `config.language` | Yes | Yes | Yes | | `config.speed` | — | Yes | — | ### Provider-specific values The `voice` and `config.language` fields accept provider-specific values. For example: * **Rime Arcana** voices: `luna`, `orion`, `astra`, `nova` * **Deepgram Aura 2** voices: `aura-2-thalia-en`, `aura-2-asteria-en`, `aura-2-celeste-es` (set the `voice` field explicitly) * **Rime Arcana** language codes: `en`, `fr`, `es`, `hi`, `ar`, `de` * **Cartesia Sonic 3**, **Murf Falcon**, **Kugel**, **Soniox TTS v1**: see the [voice catalog](/voices) for each provider's `voice_id` values The platform passes these values directly to the provider. If you send a voice identifier that the selected model does not support, the provider returns an error through the standard error response format. ### Supported sample rates and encodings Sample-rate and encoding support varies by model and protocol. The Unified API uses a common request shape, but the selected model still controls which audio formats are valid. | Model or endpoint family | Common sample rates | Encoding support | Notes | | ---------------------------- | ---------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------------- | | Unified TTS bridge | `8000`-`48000` Hz, model-dependent | `linear16`, `mp3`, `opus`, `mulaw`, `alaw` | Accepted by the bridge schema; the selected model may still reject unsupported values | | Deepgram Aura 2, proxied | See endpoint reference | `linear16`, `mp3`, `opus` | Use the direct Aura 2 reference for provider-specific behavior | | Deepgram Aura 2, SLNG-hosted | See endpoint reference | `linear16`, `mulaw`, `alaw` | Optimized for real-time and telephony-style output | | Sarvam Bulbul | See endpoint reference | `mp3`, `linear16`, `mulaw`, `alaw`, `opus` | Indian-language TTS provider | | Cartesia Sonic 3 | See endpoint reference | `linear16`, `mulaw`, `alaw` | Streaming TTS over WebSocket | For exact allowed values, use the model-specific API reference page as the source of truth. ## STT parameters | Parameter | Type | Required | Description | | ----------------- | ------- | -------- | ----------------------------------------------------- | | `audio` | file | Yes | Audio file to transcribe | | `language` | string | No | Language code. Accepted values depend on the provider | | `sample_rate` | integer | No | Audio sample rate in Hz | | `encoding` | string | No | Audio encoding format | | `enable_vad` | boolean | No | Enable voice activity detection | | `enable_partials` | boolean | No | Enable partial transcription results | ### STT provider support | Parameter | Deepgram Nova 3 | Deepgram Nova 2 | Soniox Speech AI v4 | | ----------------- | --------------- | --------------- | ------------------- | | `language` | Yes | Yes | Auto-detect | | `sample_rate` | Yes | Yes | Yes | | `encoding` | Yes | Yes | Yes | | `enable_vad` | Yes | Yes | Yes | | `enable_partials` | Yes | Yes | Yes | ### Supported STT encodings `linear16`, `mp3`, `opus`. ## How provider-specific parameters work The Unified API uses a schema-driven approach. All parameters are defined in the request schema. There is no generic passthrough or arbitrary key-value mechanism. The `voice`, `language`, and `config` fields accept provider-specific values as strings, and the platform forwards them to the underlying provider without transformation. This means: * You use the same field names regardless of provider * Provider-specific values (voice names, language codes) go into the same fields * Validation happens at the provider level. If a value is not supported, you get a standard error response with the provider's error message ## Model identifiers Models follow the pattern `provider/model:variant` for third-party and `slng/provider/model:variant` for SLNG-hosted. For example: * `deepgram/aura:2`: third-party, proxied to Deepgram * `slng/deepgram/aura:2-en`: SLNG-hosted Deepgram Aura 2 (English) * `slng/rime/arcana:3-en`: SLNG-hosted Rime Arcana v3 (English) For the full list of model identifiers, see [Supported models](/execution-layer/unified-api-models). # Getting started with SLNG Source: https://docs.slng.ai/getting-started Authenticate, send your first text-to-speech and speech-to-text requests, and stream over WebSocket with SLNG in about five minutes. All your speech traffic routes through one platform. The quickest way to see it work is to send a text-to-speech and a speech-to-text request directly. Once you have, you can [point your existing stack at the gateway](/integrations/gateway) and keep the rest of your code. ## Prerequisites * An SLNG account * A SLNG key (get one at [app.slng.ai](https://app.slng.ai/api-keys)) * Basic knowledge of REST APIs or WebSockets ## Authentication Every request needs an SLNG key in the `Authorization: Bearer SLNG_API_KEY` header. Get one from the [Dashboard](https://app.slng.ai/api-keys), and replace `SLNG_API_KEY` in the examples below with your key. For WebSocket auth, key rotation, and bringing your own provider key, see [Authentication & API Keys](/authentication). ## Your First Request A Text-to-Speech (TTS) request over HTTP turns text into an audio file: ```bash theme={null} curl https://api.slng.ai/v1/tts/slng/deepgram/aura:2-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "aura-2-thalia-en", "text": "Hello from sunny Barcelona!" }' \ --output hello.wav ``` Replace `SLNG_API_KEY` with your SLNG key and run the command. Once the request completes, you have a `hello.wav` file in the current directory. It sounds like this: Next, transcribe an audio file. The response is text you can process downstream. Download this sample file: Then run: ```bash theme={null} curl https://api.slng.ai/v1/stt/slng/deepgram/nova:3-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -F "audio=@micro-machines.wav" \ -F "language=en" ``` Response: ```json theme={null} { "text": "is the micro machine man presenting the most midget miniature motorcade of micro machine...", "transcript": "is the micro machine man presenting the most midget miniature motorcade of micro machine...", "confidence": 0.9749991, "duration": 29.888374, "language": "en", "metadata": { "request_id": "f5778fa4-40f9-4d60-993b-2f43f0164221", "model": "nova-3", "duration": 29.888374, "channels": 1 } } ``` The same endpoint accepts a remote URL instead of an upload: ```bash theme={null} curl https://api.slng.ai/v1/stt/slng/deepgram/nova:3-multi \ -H "Authorization: Bearer SLNG_API_KEY" \ -H 'Content-Type: application/json' \ --data '{"url":"https://docs.slng.ai/audio/micro-machines.wav", "language":"en"}' ``` The response is identical to the upload version. ## Go Further Route your existing voice agent through the gateway and keep your code. Native integrations for LiveKit, Cognigy, and Jambonz. How SLNG routes and optimizes every turn between your app and the models. Combine streaming STT, low-latency TTS, and tool-calling to greet, route, and escalate calls. Understand when to use each protocol and their trade-offs. Explore all available TTS and STT models and find the right one for your use case. # SLNG: the execution layer for voice Source: https://docs.slng.ai/index Text-to-speech, speech-to-text, and voice agents through one API. Your orchestrator stays, your models stay. Every turn is routed and optimized for lower cost and lower latency.
● Unmuted

The global execution layer for real-time voice

Keep your models. Keep your orchestrator. The execution layer sits between them and cuts cost and latency on every turn.

\$ npx skills add slng-ai/skills

Teach your coding agent to build with SLNG, or send your first request by hand.

39%
less turn latency, end to end
53%
less model cost
+16%
better call outcomes
30+
models behind one API
Drop-in

Plug in the endpoints. Nothing else changes

Your orchestrator, your prompts, and your tools stay exactly as they are. Repoint your speech calls at SLNG, and routing, caching, and failover happen behind the endpoint.

# keep your orchestrator and prompts. repoint two calls. - stt: [https://your-current-provider/…](https://your-current-provider/…) + stt: [https://api.slng.ai/v1/stt/](https://api.slng.ai/v1/stt/)slng/deepgram/nova:3-enslng/deepgram/nova:3-multisoniox/soniox:latestslng/deepgram/nova:3-en - tts: [https://your-current-provider/…](https://your-current-provider/…) + tts: [https://api.slng.ai/v1/tts/](https://api.slng.ai/v1/tts/)slng/deepgram/aura:2-enslng/rime/arcana:3-enslng/rime/arcana:3-esslng/deepgram/aura:2-en
How it works

Every turn takes the path it actually needs

A 16-turn call makes 48 model calls. By default each one runs from scratch, even for a greeting you have synthesized a thousand times. The execution layer changes that across three stages.

Integrations

Drops into your orchestrator

Run SLNG inside the agent framework you already use. Same endpoints, native adapters.

Bring your own keys

Works with the models you already use

Bring your own provider keys and route across 30+ speech models over standard HTTP and WebSocket, without changing your integration.

ISO 27001 certified. HIPAA and GDPR compliant. trust.slng.ai

# Drop-in gateway Source: https://docs.slng.ai/integrations/gateway Point your existing voice stack at the SLNG platform and keep your code. Route speech-to-text and text-to-speech through one base URL to add caching, regional routing, and provider flexibility. The fastest way to add SLNG to an existing voice agent is to route your speech traffic through the platform. You keep your application code, your agent logic, and your call flow. You change where the speech requests go. ## How it works The SLNG platform exposes one consistent base URL for every model. The path selects the service, provider, and model: ``` https://api.slng.ai/v1/{tts|stt}/{provider}/{model} ``` Send your text-to-speech and speech-to-text requests there with your SLNG key, and you get caching, [regional routing](/region-override), and access to every supported provider behind a single integration. Create one at [app.slng.ai/api-keys](https://app.slng.ai/api-keys). Replace your provider's endpoint with the SLNG platform URL and send your SLNG key in the `Authorization` header. ```bash Text-to-speech theme={null} curl https://api.slng.ai/v1/tts/slng/deepgram/aura:2-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "aura-2-thalia-en", "text": "Hello from SLNG!" }' \ --output hello.wav ``` ```bash Speech-to-text theme={null} curl https://api.slng.ai/v1/stt/slng/deepgram/nova:3-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -F "audio=@call.wav" \ -F "language=en" ``` Already have a provider contract? Add the `X-Slng-Provider-Key` header. Billing runs against your provider account, and output assembly still applies. BYOK applies to proxied providers, so use a provider path (not a `slng/`-hosted one). For example, Soniox: ```bash highlight={3} theme={null} curl https://api.slng.ai/v1/tts/soniox/tts-rt:v1 \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "X-Slng-Provider-Key: YOUR_PROVIDER_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "I brought my own key to the party.", "voice": "Adrian", "audio_format": "wav", "sample_rate": 24000 }' \ --output hello.wav ``` See [Bring your own key](/execution-layer/byok) for supported providers, WebSocket setup, and billing. ## What you get Repeated text-to-speech served from cache. Pin traffic to a region with one header. The execution path every turn takes between your app and the models. Switch providers by changing the path. ## Prefer a framework plugin? If your agent runs on LiveKit, Cognigy, or Jambonz, use the native integration instead of calling the platform directly. See [Voice platform integrations](/integrations/overview). # Voice platform integrations Source: https://docs.slng.ai/integrations/overview Use SLNG STT and TTS through LiveKit Agents, Cognigy Voice Gateway, and Jambonz with no protocol rewrites. Region overrides included. Already building on a voice framework? Use SLNG's STT and TTS models through it with a native plugin instead of calling the gateway directly. Each integration translates the platform's native protocol into SLNG calls, so you add caching, regional routing, and provider flexibility without rewriting your agent. If you run a custom stack instead of a framework, point your requests at the [drop-in gateway](/integrations/gateway). ## Region and world-part overrides All integrations route through the SLNG platform, which supports the `X-Region-Override` and `X-World-Part-Override` headers to pin requests to a specific region (for data residency) or to a broader geographic zone like `eu` or `na`. The [Cognigy](/api-reference/bridges/cognigy-stt-bridge/cognigy-stt-bridge-ws) and [Jambonz](/api-reference/bridges/jambonz-stt-bridge/jambonz-stt-bridge-ws) bridges also accept `?region=` and `?world-part=` query parameters. Use these when the platform does not let you set custom HTTP headers. If you send both, the header wins. The [LiveKit plugin](/agents/livekit-plugin#region-override) wraps this in a `region_override` parameter on the `STT` and `TTS` classes, so you set it once when you instantiate the model. The [Pipecat plugin](/agents/pipecat-plugin#region-routing) exposes the same routing through `region_override` and `world_part_override` on its STT and TTS services. See [Region and world-part overrides](/region-override) for the full region list and header behavior. ## Available integrations Use SLNG models inside LiveKit Agents with the livekit-plugins-slng adapter. Use SLNG models inside Pipecat pipelines with the pipecat-slng STT/TTS services. Connect Cognigy voice bots to SLNG STT and TTS models. Route Jambonz calls through SLNG for speech processing. # Models by Language Source: https://docs.slng.ai/models/by-language Browse SLNG TTS and STT models grouped by supported language: English, Spanish, French, German, Hindi, Japanese, Mandarin, and 30+ Indian languages. Many models support multiple languages. A model listed under one language may also appear under others. See individual model pages for the full list of supported languages and voices. ## Arabic ## Bengali ## Bulgarian ## Chinese ## Czech ## Danish ## Dutch ## English ## Finnish ## French ## Georgian ## German ## Greek ## Gujarati ## Hebrew ## Hindi ## Hungarian ## Indonesian ## Italian ## Japanese ## Kannada ## Korean ## Malay ## Malayalam ## Marathi ## Norwegian ## Polish ## Portuguese ## Punjabi ## Romanian ## Russian ## Slovak ## Slovenian ## Spanish ## Swedish ## Tagalog ## Tamil ## Telugu ## Thai ## Turkish ## Ukrainian ## Vietnamese ## Indian Languages Models with dedicated support for Indian languages beyond Hindi, including Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu. # Models by Region Source: https://docs.slng.ai/models/by-region See which SLNG TTS and STT models are deployed in each AWS region across North America, Europe, and Asia-Pacific for low-latency routing. Models may be available in multiple regions. SLNG auto-selects the closest region by default. Use the `X-Region-Override` header to pin a specific region, or `X-World-Part-Override` to pin a broader geographic zone (`na`, `eu`, `ap`). See [Region & World-Part Overrides](/region-override) for details. ### Available across Europe ### EU North (eu-north-1) ### AP South (ap-south-1) ### AP Southeast (ap-southeast-2) ### AP South — GCP (asia-south1) ### Sydney (australia-southeast1) We don't have models deployed in South America yet. [Contact us](mailto:support@slng.ai) if you'd like us to host models in this region. We don't have models deployed in the Middle East or Africa yet. [Contact us](mailto:support@slng.ai) if you'd like us to host models in this region. # Model Catalog Source: https://docs.slng.ai/models/index Browse the full catalog of TTS and STT models on SLNG. Search by provider, language, or AWS deployment region for production voice agents. SLNG gives you access to **14 TTS models** and **8 STT models** from **10 providers**, all through a single platform. Browse by type or find which models are deployed in your region. ## At a Glance ### Text-to-Speech | Model | Provider | Hosting | Latency | Languages | | --------------------------------------------------------------------------- | ---------- | ---------- | ------- | ------------------- | | [Rime Arcana v3](/api-reference/tts/rime-arcana-v3/arcana-v3-english-ws) | Rime | SLNG | Low | 4 languages | | [Rime Arcana v1](/api-reference/tts/rime-arcana-v2/arcana-v2-english-ws) | Rime | SLNG | Low | 5 languages | | [Deepgram Aura 2](/api-reference/tts/deepgram-aura-2/aura-2-english-ws) | Deepgram | SLNG | Low | 2 languages | | [Aura 2 (Proxy)](/api-reference/tts/deepgram-aura-2/aura-2-ws) | Deepgram | Deepgram | Medium | 7 languages | | [Bulbul v3](/api-reference/tts/sarvam-ai-bulbul-v3/bulbul-v3-ws) | Sarvam AI | Sarvam AI | Medium | 11 languages | | [Kugel 1 Turbo](/api-reference/tts/kugel-1-turbo/kugel-1-turbo-ws) | KugelAudio | KugelAudio | Low | 26 languages | | [Kugel 1](/api-reference/tts/kugel-1/kugel-1-ws) | KugelAudio | KugelAudio | Medium | 26 languages | | [Kugel 2](/api-reference/tts/kugel-2/kugel-2-ws) | KugelAudio | KugelAudio | Medium | 26 languages | | [Murf Falcon](/api-reference/tts/murf-falcon/murf-falcon-ws) | Murf | Murf | Low | English and Spanish | | [Cartesia Sonic 3](/api-reference/tts/cartesia-sonic-3/cartesia-sonic-3-ws) | Cartesia | Cartesia | Low | 40+ languages | | [Soniox TTS RT v1](/api-reference/tts/soniox-tts-v1/soniox-tts-v1-ws) | Soniox | Soniox | Low | English | | [Gradium TTS](/api-reference/tts/gradium-tts/gradium-tts-ws) | Gradium | Gradium | Low | 5 languages | | [Kugel 2 Turbo](/api-reference/tts/kugel-2-turbo/kugel-2-turbo-ws) | KugelAudio | KugelAudio | Low | 26 languages | | [Inworld Max 1.5](/api-reference/tts/inworld-max/inworld-max-ws) | Inworld | SLNG | Low | 15 languages | ### Speech-to-Text | Model | Provider | Hosting | Latency | Languages | | --------------------------------------------------------------------------------------------- | --------- | --------- | ------- | -------------------- | | [Deepgram Nova 3](/api-reference/stt/deepgram-nova-3/nova-3-multi-language-ws) | Deepgram | SLNG | Low | 4 languages | | [Nova 2 (Proxy)](/api-reference/stt/deepgram-nova-2/nova-2-ws) | Deepgram | Deepgram | Low | 36 languages | | [Nova 3 (Proxy)](/api-reference/stt/deepgram-nova-3/nova-3-ws) | Deepgram | Deepgram | Low | 17 languages | | [Nova 3 Medical](/api-reference/stt/deepgram-nova-3-medical/nova-3-medical-ws) | Deepgram | Deepgram | Low | English | | [Speech AI RT v4](/api-reference/stt/soniox-speech-ai-real-time-v4/speech-ai-real-time-v4-ws) | Soniox | Soniox | Low | 60+ languages | | [Sarvam AI Saaras](/api-reference/stt/sarvam-ai-saaras/saaras-v3-http) | Sarvam AI | Sarvam AI | Medium | 24 languages | | [Reson8 STT v1](/api-reference/stt/reson8-stt-v1/reson8-stt-v1-ws) | Reson8 | Reson8 | Low | 9 European languages | | [Gradium STT](/api-reference/stt/gradium-stt/gradium-stt-ws) | Gradium | Gradium | Low | 5 languages | ## Browse Models 14 models from 9 providers. 8 models from 5 providers. See which models are deployed in your region. Find models that support your target language. # Speech-to-Text Models Source: https://docs.slng.ai/models/stt Browse every STT model on SLNG: Deepgram Nova, Sarvam Saaras, Soniox Speech AI, and Reson8 transcription, grouped by provider. ## SLNG-hosted Models hosted on SLNG infrastructure for optimized latency and throughput. ## Deepgram Direct proxy to Deepgram's STT API. ## Gradium ## Reson8 ## Sarvam AI Indian-language specialist. ## Soniox 60+ language streaming speech recognition. # Text-to-Speech Models Source: https://docs.slng.ai/models/tts Browse every TTS model on SLNG: Cartesia, Deepgram Aura, ElevenLabs, KugelAudio, Murf, Rime Arcana, Sarvam, and Soniox voices. ## SLNG-hosted Models hosted and optimized on SLNG infrastructure for the lowest latency. ## Cartesia ## Deepgram Direct proxy to Deepgram's TTS API. ## Gradium ## KugelAudio Multilingual TTS with expressiveness control. ## Murf ## Sarvam AI Indian-language specialist. ## Soniox # Pronunciation dictionaries Source: https://docs.slng.ai/pronunciation-dictionaries Create reusable pronunciation dictionaries and attach them to any SLNG TTS request so brand names, acronyms, and domain terms are spoken the way you expect. Pronunciation dictionaries let you control how text is spoken before it reaches the selected TTS model. Create a dictionary once, then reference it from HTTP, WebSocket, or Unified TTS requests. Use them when a voice needs to pronounce acronyms, product names, customer names, jargon, or multilingual terms consistently across TTS models. ## Placeholders The snippets below use these placeholders. Replace them before running the code. | Placeholder | Replace with | | ------------------------ | --------------------------------------------------------------------------- | | `SLNG_API_KEY` | An SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys) | | `support-pronunciations` | The dictionary `name` you create with `POST /v1/pronunciation/dictionaries` | | `pd_01abc...` | A dictionary `dictionary_id` returned at creation time | Pronunciation dictionaries currently support rewrite mode. SLNG rewrites matching words or phrases before synthesis, and the rewritten text is what the selected model receives. ## How it works Each dictionary belongs to the organization resolved from your SLNG key. Requests from another organization cannot read or use it. The basic flow is: 1. Create a dictionary with rewrite rules. 2. Reference that dictionary by `name` or `dictionary_id`. 3. Send a TTS request with a `pronunciation` object. Only one active pronunciation dictionary can apply to a request or WebSocket turn. ## Create a dictionary Create dictionaries with the [pronunciation dictionary API reference](/api-reference/tts/pronunciation-dictionaries/create-pronunciation-dictionary-http): ```bash theme={null} curl -X POST https://api.slng.ai/v1/pronunciation/dictionaries \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "support-pronunciations", "metadata": { "language": "en-US", "use_case": "support voice agent" }, "modes": { "rewrite": { "rules": [ { "match": "SLNG", "replace": "slang" }, { "match": "QubePay", "replace": "cube pay" }, { "match": "ACH transfer", "replace": "ay see aitch transfer" }, { "match": "ACH", "replace": "ay see aitch" } ] } } }' ``` A successful response includes the dictionary `id`, normalized name, metadata, modes, content hash, and creation timestamp: ```json theme={null} { "id": "pd_01abc...", "org_id": "org_123", "name": "support-pronunciations", "normalized_name": "support-pronunciations", "metadata": { "language": "en-US", "use_case": "support voice agent" }, "modes": { "rewrite": { "rules": [ { "match": "SLNG", "replace": "slang" }, { "match": "QubePay", "replace": "cube pay" }, { "match": "ACH transfer", "replace": "ay see aitch transfer" }, { "match": "ACH", "replace": "ay see aitch" } ] } }, "content_hash": "sha256:...", "created_at": "2026-05-15T12:00:00.000Z" } ``` Dictionary names must be unique within your organization. Names can contain letters, numbers, `.`, `_`, and `-`, and can be up to 128 characters. ## Hear the difference Use the same text with and without the dictionary: ```text theme={null} Thanks for calling SLNG support. I found your QubePay ACH transfer, and the next invoice will arrive on Friday. ``` ## Manage dictionaries For request and response schemas, see the generated API reference pages for [listing dictionaries](/api-reference/tts/pronunciation-dictionaries/list-pronunciation-dictionaries-http), [reading one dictionary](/api-reference/tts/pronunciation-dictionaries/get-pronunciation-dictionary-http), and [deleting a dictionary](/api-reference/tts/pronunciation-dictionaries/delete-pronunciation-dictionary-http). List dictionaries: ```bash theme={null} curl -s https://api.slng.ai/v1/pronunciation/dictionaries \ -H "Authorization: Bearer SLNG_API_KEY" ``` Get one dictionary by name: ```bash theme={null} curl -s https://api.slng.ai/v1/pronunciation/dictionaries/support-pronunciations \ -H "Authorization: Bearer SLNG_API_KEY" ``` Delete a dictionary: ```bash theme={null} curl -X DELETE https://api.slng.ai/v1/pronunciation/dictionaries/support-pronunciations \ -H "Authorization: Bearer SLNG_API_KEY" ``` ## Use a dictionary with HTTP TTS Add a `pronunciation` object to the TTS request body: ```bash theme={null} curl -X POST https://api.slng.ai/v1/tts/slng/deepgram/aura:2-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "aura-2-thalia-en", "text": "Thanks for calling SLNG support. I found your QubePay ACH transfer, and the next invoice will arrive on Friday.", "pronunciation": { "mode": "rewrite", "name": "support-pronunciations" } }' ``` You can also reference the dictionary by immutable ID: ```json theme={null} { "pronunciation": { "mode": "rewrite", "dictionary_id": "pd_01abc..." } } ``` Rules for the request object: * `mode` must be `"rewrite"` * provide exactly one of `name` or `dictionary_id` * use only one active dictionary per request With the example dictionary above, the selected model receives this rewritten text: ```text theme={null} Thanks for calling slang support. I found your cube pay ay see aitch transfer, and the next invoice will arrive on Friday. ``` ## Use a dictionary with WebSocket TTS Set a default dictionary when you initialize the session: ```json theme={null} { "type": "init", "config": { "pronunciation": { "mode": "rewrite", "name": "support-pronunciations" } } } ``` Then send text normally: ```json theme={null} { "type": "text", "text": "Thanks for calling SLNG support. I found your QubePay ACH transfer, and the next invoice will arrive on Friday." } ``` To change dictionaries for a later turn, include `pronunciation` on the `text` message: ```json theme={null} { "type": "text", "text": "Use another dictionary for this turn", "pronunciation": { "mode": "rewrite", "name": "finance-pronunciations" } } ``` `init.config.pronunciation` sets the session default. `text.pronunciation` replaces the active dictionary for that turn, and later text turns reuse the most recent active dictionary. ## Use a dictionary with Unified TTS Use the same `pronunciation` shape with Unified TTS bridge requests: ```json theme={null} { "type": "init", "model": "slng/deepgram/aura:2-en", "config": { "language": "en-US", "sample_rate": 24000, "encoding": "linear16", "pronunciation": { "mode": "rewrite", "name": "support-pronunciations" } } } ``` ## Rewrite matching Rewrite mode is deterministic: * matching is case-insensitive * only whole words or whole phrases are matched * longer phrases win before shorter matches * rewriting is single-pass and non-recursive For example, this dictionary prefers `ACH transfer` over the shorter `ACH` match: ```json theme={null} { "modes": { "rewrite": { "rules": [ { "match": "ACH", "replace": "ay see aitch" }, { "match": "ACH transfer", "replace": "ay see aitch transfer" } ] } } } ``` Input: ```text theme={null} I found your ACH transfer ``` Rewritten result: ```text theme={null} I found your ay see aitch transfer ``` ## Limits and errors Current limits: | Limit | Value | | ------------------------------ | ------------------ | | Dictionary name | 128 characters | | Rewrite rules per dictionary | 256 | | IPA rules per dictionary | 256 | | `match` length | 128 characters | | `replace` length | 256 characters | | `ipa` length | 256 characters | | Runtime text input for rewrite | 20,000 characters | | Runtime rewritten output | 100,000 characters | Pronunciation resolution fails closed. If the dictionary cannot be found or resolved, the request or WebSocket turn is not sent to the selected TTS model. Common HTTP errors: | Status and code | Meaning | | ----------------------------------- | ------------------------------------------------------------------------- | | `400 invalid_pronunciation` | Malformed object, invalid name, unsupported mode, or dictionary not found | | `401 pronunciation_unauthenticated` | Missing or unresolved organization context | | `409 pronunciation_conflict` | Dictionary name already exists in the organization | | `503 pronunciation_unavailable` | Platform storage or dependency failure during dictionary resolution | Common WebSocket failures return an error frame: ```json theme={null} { "type": "error", "code": "pronunciation_not_found", "message": "Pronunciation dictionary not found: support-pronunciations", "slng_request_id": "..." } ``` ## Current limitations * Only `mode: "rewrite"` is executable today. * `modes.ipa` can be stored but is not executed. * There is no automatic fallback from IPA to rewrite. * Provider-native pronunciation dictionary uploads are not supported through SLNG. ## Recommended pattern For most applications, create a stable dictionary such as `support-pronunciations` and reuse it by name. Use `dictionary_id` only when your application needs an immutable machine reference. Keep dictionaries scoped to a domain, product line, or voice style. For WebSocket sessions, set the default dictionary in `init`, then override individual turns only when needed. # HTTP vs. WebSocket protocols Source: https://docs.slng.ai/protocols Compare HTTP and WebSocket protocols on SLNG. Latency, flow, complexity, and when to use each for text-to-speech and speech-to-text workloads. Every TTS and STT model is reachable over both HTTP and WebSocket. The URL stays the same; only the protocol changes. The snippets on this page use `SLNG_API_KEY` as a placeholder. Replace it with an SLNG key from [app.slng.ai/api-keys](https://app.slng.ai/api-keys) before running the code. ## At a Glance | | HTTP | WebSocket | | -------------- | ---------------------------------- | ---------------------------------- | | **Flow** | Request → wait → complete response | Open connection → stream both ways | | **Latency** | 200–500 ms | Sub-100 ms | | **Best for** | Batch jobs, file conversion | Voice agents, live transcription | | **Complexity** | One `curl` call | Connection lifecycle to manage | ## How Each Protocol Works You send a request and get back the full result once processing finishes. ```mermaid theme={null} sequenceDiagram participant Client participant SLNG Client->>SLNG: POST /v1/tts/slng/deepgram/aura:2-en
{ "text": "Hello" } Note over SLNG: Generates full audio SLNG-->>Client: 200 OK – complete WAV file ``` **Use HTTP when you:** * Generate audio files for download or storage * Transcribe pre-recorded audio files * Want the simplest possible integration * Don't need real-time streaming ```bash theme={null} curl https://api.slng.ai/v1/tts/slng/deepgram/aura:2-en \ -H "Authorization: Bearer SLNG_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "aura-2-thalia-en", "text": "Hello from HTTP!" }' \ --output hello.wav ``` See complete examples: [TTS over HTTP](/examples/tts-http) · [STT over HTTP](/examples/stt-http)
You open a persistent connection and stream data in both directions. ```mermaid theme={null} sequenceDiagram participant Client participant SLNG Client->>SLNG: Connect wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en SLNG-->>Client: Connection open Client->>SLNG: { type: "text", text: "Hello" } SLNG-->>Client: audio chunk 1 SLNG-->>Client: audio chunk 2 SLNG-->>Client: audio chunk 3 Client->>SLNG: { type: "text", text: "More text" } SLNG-->>Client: audio chunk 4 Note over Client,SLNG: Connection stays open – send more text anytime ``` **Use WebSocket when you:** * Build voice agents (STT + LLM + TTS in a loop) * Need real-time transcription from a microphone * Want the lowest possible latency * Stream audio continuously ```javascript theme={null} const ws = new WebSocket("wss://api.slng.ai/v1/tts/slng/deepgram/aura:2-en"); ws.onopen = () => { ws.send(JSON.stringify({ type: "text", text: "Hello from WebSocket!" })); }; ws.onmessage = (event) => { if (event.data instanceof ArrayBuffer) { playAudio(event.data); // Stream audio as it arrives } }; ``` See complete examples: [TTS over WebSocket](/examples/tts-websocket) · [STT over WebSocket](/examples/stt-websocket)
## WebSocket Best Practices WebSocket connections are stateful. You need to handle the lifecycle: * **Reconnect with backoff.** If the connection drops, retry with exponential delay (1s, 2s, 4s... up to 30s). * **Handle binary and text frames.** Audio arrives as binary `ArrayBuffer`; control messages come as JSON text. * **Send a close frame** when you're done so the server releases resources. ## Which Protocol for Which Use Case? | Use case | Protocol | Why | | --------------------------------- | --------- | ------------------------------- | | Convert a script to an audio file | HTTP | One request, one file. Simple | | Transcribe a batch of recordings | HTTP | Upload each file, get results | | Voice agent (phone or web) | WebSocket | Real-time STT→LLM→TTS loop | | Live captioning / transcription | WebSocket | Stream mic audio, get text back | | Generate a voiceover for a video | HTTP | No real-time requirement | ## Next Steps Generate audio files from text with simple HTTP requests. Stream audio in real-time with interruption support. Transcribe pre-recorded audio files. Transcribe live audio from a microphone. # Error Codes & Troubleshooting Source: https://docs.slng.ai/reference/errors Authentication, WebSocket, and pronunciation-dictionary error codes for the SLNG API, plus fixes for common streaming and audio-format issues. ## Authentication errors | Error | Cause | Fix | | -------------------------------- | ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------ | | HTTP 401 | Missing or invalid SLNG key | Check the `Authorization: Bearer SLNG_API_KEY` header. Verify the key is active in the [Dashboard](https://app.slng.ai). | | WebSocket rejected with 401 | SLNG key not passed in the handshake | Pass the key as a header (`Authorization: Bearer SLNG_API_KEY`) or query parameter (`?token=SLNG_API_KEY`). | | `X-Slng-Auth-Source: client_key` | BYOK provider key rejected upstream | Your provider key was rejected, not the SLNG key. Check the key with the provider directly. | ## WebSocket errors | Error code | Meaning | What to do | | --------------------------- | ----------------------------------- | ------------------------------------------------------------------ | | `auth_error` | Invalid or missing SLNG key | Check your Authorization header. | | `config_error` | Invalid session configuration | Verify your `init` message parameters. | | `rate_limit` | Too many concurrent connections | Back off and retry with exponential delay. | | `provider_error` | Upstream model returned an error | Check model availability; consider configuring failover models. | | `backend_connection_failed` | Could not connect to upstream model | The model may be temporarily unavailable; retry or use a fallback. | ## Pronunciation dictionary errors | Status | Code | Meaning | | ------ | ------------------------------- | ------------------------------------------------------------------------- | | 400 | `invalid_pronunciation` | Malformed object, invalid name, unsupported mode, or dictionary not found | | 401 | `pronunciation_unauthenticated` | Missing or unresolved organization context | | 409 | `pronunciation_conflict` | Dictionary name already exists in your organization | | 503 | `pronunciation_unavailable` | Platform dependency failure during dictionary resolution | WebSocket pronunciation errors return an error frame: ```json theme={null} { "type": "error", "code": "pronunciation_not_found", "message": "Pronunciation dictionary not found: support-pronunciations" } ``` ## Common issues ### Connection drops WebSocket disconnects unexpectedly. * Implement reconnection with exponential backoff (1s, 2s, 4s, up to 30s). * Send periodic keep-alive messages (`{"type": "keepalive"}` for STT) to prevent idle timeouts. * If behind a corporate proxy, confirm it supports WebSocket upgrades (the `Connection: Upgrade` header). ### Choppy or distorted audio (TTS) * Buffer at least 200ms of audio before starting playback. * Use the WebAudio API (`AudioContext`) instead of `