Skip to main content
SLNG gives you access to a range of text-to-speech models from multiple providers through a single platform. The execution layer assembles output from cache when possible, so repeated phrases return from the edge without hitting the upstream model. For the full, current list with latency, hosting, and language details, see Text-to-Speech Models. For voice samples, see Voices.

Which approach should you use?

I need to…Use
Generate audio files for download or storageHTTP
Stream audio in real time for a voice agent or appWebSocket Streaming
Control how brand names and acronyms are pronouncedAdd Pronunciation Dictionaries
Use my own provider keys with SLNG’s optimizationEnable BYOK

Which model should you use?

I need…Recommended modelWhy
Lowest latency English voice agentDeepgram Aura 2 (SLNG-hosted)Low-latency, deployed on SLNG infrastructure
Lowest latency with broad language coverageCartesia Sonic 3Many languages, low latency
Hindi / Indian languagesSarvam BulbulIndian languages, many voices
Multilingual with expressiveness controlKugelAudio KugelBroad language coverage
SpanishDeepgram Aura 2 Spanish (SLNG-hosted)SLNG-hosted Spanish voice

Execution layer behavior

Every TTS request goes through the output assembly stage, which checks whether this text, voice, and config combination has been assembled before:
  • Cache hit: audio returns from the edge. No upstream model call. No provider billing.
  • Cache miss: the request routes to the model endpoint, audio is synthesized, and the result is cached.
The more calls flow through the system, the more cache coverage grows. Repeated phrases (greetings, disclosures, confirmations) become effectively free. See Output Assembly for caching, segment reuse, and cache scoping.

Quick example

curl https://api.slng.ai/v1/tts/slng/deepgram/aura:2 \
  -H "Authorization: Bearer SLNG_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from SLNG!"}' \
  --output hello.wav
See HTTP for complete examples in cURL, JavaScript, and Python, and WebSocket Streaming for real-time voice agent audio.