Text-to-Speech Overview

SLNG gives you access to a range of text-to-speech models from multiple providers through a single platform. The execution layer assembles output from cache when possible, so repeated phrases return from the edge without hitting the upstream model. For the full, current list with latency, hosting, and language details, see Text-to-Speech Models. For voice samples, see Voices.

Which approach should you use?

I need to…	Use
Generate audio files for download or storage	HTTP
Stream audio in real time for a voice agent or app	WebSocket Streaming
Control how brand names and acronyms are pronounced	Add Pronunciation Dictionaries
Use my own provider keys with SLNG’s optimization	Enable BYOK

Which model should you use?

I need…	Recommended model	Why
Lowest latency English voice agent	Deepgram Aura 2 (SLNG-hosted)	Low-latency, deployed on SLNG infrastructure
Lowest latency with broad language coverage	Cartesia Sonic 3	Many languages, low latency
Hindi / Indian languages	Sarvam Bulbul	Indian languages, many voices
Multilingual with expressiveness control	KugelAudio Kugel	Broad language coverage
Spanish	Deepgram Aura 2 Spanish (SLNG-hosted)	SLNG-hosted Spanish voice

Execution layer behavior

Every TTS request goes through the output assembly stage, which checks whether this text, voice, and config combination has been assembled before:

Cache hit: audio returns from the edge. No upstream model call. No provider billing.
Cache miss: the request routes to the model endpoint, audio is synthesized, and the result is cached.

The more calls flow through the system, the more cache coverage grows. Repeated phrases (greetings, disclosures, confirmations) become effectively free. See Output Assembly for caching, segment reuse, and cache scoping.

Quick example

curl https://api.slng.ai/v1/tts/slng/deepgram/aura:2 \
  -H "Authorization: Bearer SLNG_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from SLNG!"}' \
  --output hello.wav

See HTTP for complete examples in cURL, JavaScript, and Python, and WebSocket Streaming for real-time voice agent audio.

Authentication & API Keys Text-to-speech HTTP examples

⌘I

Get started

Text-to-Speech

Speech-to-Text

Voice Agents

Integrations

SDKs & Tools

Reference

Which approach should you use?

Which model should you use?

Execution layer behavior

Quick example

​Which approach should you use?

​Which model should you use?

​Execution layer behavior

​Quick example

Which approach should you use?

Which model should you use?

Execution layer behavior

Quick example