SLNG provides access to best-in-class speech models through a unified API. All models support consistent protocols and provide production-ready performance.
All Models Overview
| Provider | Model | Variants | Protocols | Latency | Languages | Best For |
|---|---|---|---|---|---|---|
| Deepgram | Aura | Aura 2 | HTTPWebSocket | <200ms | EN, ES | Most voices (50+) with Spanish support |
| ElevenLabs | Eleven v3 | Eleven v3 | HTTP | <100ms | Multiple | Lowest latency (<100ms) and best voice quality |
| ElevenLabs | Eleven Flash | Eleven Flash v2 | HTTPWebSocket | <100ms | Multiple | Ultra-low latency (<100ms) for real-time applications |
| ElevenLabs | Eleven Flash | Eleven Flash v2.5 | HTTPWebSocket | <100ms | Multiple | Latest ultra-low latency (<100ms) with multilingual support |
| ElevenLabs | Eleven Multilingual | Multilingual v2 | HTTPWebSocket | <100ms | Multiple | Multilingual support with low latency |
| SLNG | Orpheus | Orpheus English | HTTPWebSocket | low | EN | Emotion control and custom deployment |
| SLNG | Chatterbox | Chatterbox v2 | HTTP | ~500ms | EN, ES, FR, DE, IT, PT, PL, TR, RU, NL, CS, AR, ZH, JA, HU, KO, HI, FI, VI, RO, BG, HR, TH | Open source TTS with emotion control, voice cloning, and 23 language support |
| SLNG | CSM-1B | CSM-1B | HTTP | medium | EN | Conversational applications |
| SLNG | Melo | Melo v1 | HTTP | medium | EN, ES, FR, ZH, JA, KO | High-quality voice synthesis |
| SLNG | CosyVoice | CosyVoice v1 | HTTP | medium | EN, ZH | Natural voice synthesis |
| SLNG | Kokoro | Kokoro v1 | HTTP | medium | EN | High-quality voice synthesis |
| SLNG | Aura | Aura 2 | HTTPWebSocket | medium | EN, ES | Most voices (50+) with Spanish support |
| Deepgram | Nova | 3 variants | HTTPWebSocket | real-time | EN, ES, FR, DE, IT, PT, RU, JA, KO, ZH, AR, HI, NL, SV, DA, NO, FI, PL, TR, HE, TH, VI, ID, MS, TL, CS, SK, HU, RO, BG, HR, SL, ET, LV, LT, UK, EL, CA, EU, GL | Real-time streaming with lowest latency |
| SLNG | Whisper | Whisper Large v3 | HTTPWebSocket | batch | Multiple | Highest transcription accuracy for batch processing |
| SLNG | Speechmatics | 2 variants | WebSocket | realtime | Multiple | High-accuracy English speech recognition in real-time scenarios |
| SLNG | Nova | 2 variants | HTTPWebSocket | real-time | EN, ES, FR, DE, IT, PT, RU, JA, KO, ZH, AR, HI, NL, SV, DA, NO, FI, PL, TR, HE, TH, VI, ID, MS, TL, CS, SK, HU, RO, BG, HR, SL, ET, LV, LT, UK, EL, CA, EU, GL | Real-time streaming with lowest latency |
Text-to-Speech Models
| Provider | Model | Variants | Protocols | Latency | Languages | Best For |
|---|---|---|---|---|---|---|
| Deepgram | Aura | Aura 2 | HTTPWebSocket | <200ms | EN, ES | Most voices (50+) with Spanish support |
| ElevenLabs | Eleven v3 | Eleven v3 | HTTP | <100ms | Multiple | Lowest latency (<100ms) and best voice quality |
| ElevenLabs | Eleven Flash | Eleven Flash v2 | HTTPWebSocket | <100ms | Multiple | Ultra-low latency (<100ms) for real-time applications |
| ElevenLabs | Eleven Flash | Eleven Flash v2.5 | HTTPWebSocket | <100ms | Multiple | Latest ultra-low latency (<100ms) with multilingual support |
| ElevenLabs | Eleven Multilingual | Multilingual v2 | HTTPWebSocket | <100ms | Multiple | Multilingual support with low latency |
| SLNG | Orpheus | Orpheus English | HTTPWebSocket | low | EN | Emotion control and custom deployment |
| SLNG | Chatterbox | Chatterbox v2 | HTTP | ~500ms | EN, ES, FR, DE, IT, PT, PL, TR, RU, NL, CS, AR, ZH, JA, HU, KO, HI, FI, VI, RO, BG, HR, TH | Open source TTS with emotion control, voice cloning, and 23 language support |
| SLNG | CSM-1B | CSM-1B | HTTP | medium | EN | Conversational applications |
| SLNG | Melo | Melo v1 | HTTP | medium | EN, ES, FR, ZH, JA, KO | High-quality voice synthesis |
| SLNG | CosyVoice | CosyVoice v1 | HTTP | medium | EN, ZH | Natural voice synthesis |
| SLNG | Kokoro | Kokoro v1 | HTTP | medium | EN | High-quality voice synthesis |
| SLNG | Aura | Aura 2 | HTTPWebSocket | medium | EN, ES | Most voices (50+) with Spanish support |
Speech-to-Text Models
| Provider | Model | Variants | Protocols | Latency | Languages | Best For |
|---|---|---|---|---|---|---|
| Deepgram | Nova | 3 variants | HTTPWebSocket | real-time | EN, ES, FR, DE, IT, PT, RU, JA, KO, ZH, AR, HI, NL, SV, DA, NO, FI, PL, TR, HE, TH, VI, ID, MS, TL, CS, SK, HU, RO, BG, HR, SL, ET, LV, LT, UK, EL, CA, EU, GL | Real-time streaming with lowest latency |
| SLNG | Whisper | Whisper Large v3 | HTTPWebSocket | batch | Multiple | Highest transcription accuracy for batch processing |
| SLNG | Speechmatics | 2 variants | WebSocket | realtime | Multiple | High-accuracy English speech recognition in real-time scenarios |
| SLNG | Nova | 2 variants | HTTPWebSocket | real-time | EN, ES, FR, DE, IT, PT, RU, JA, KO, ZH, AR, HI, NL, SV, DA, NO, FI, PL, TR, HE, TH, VI, ID, MS, TL, CS, SK, HU, RO, BG, HR, SL, ET, LV, LT, UK, EL, CA, EU, GL | Real-time streaming with lowest latency |
Model Selection Guide
With the variety of models available, choosing the right one depends on your specific use case and requirements. Here are some recommendations to help you decide based on our internal benchmarks.
By Use Case
Real-time Voice Agents
- Primary: Deepgram Nova (STT) + Deepgram Aura (TTS)
- Alternative: Deepgram Nova (STT) + ElevenLabs Eleven (TTS)
- Why: Lowest combined latency, optimized for conversations
Content Creation & Audiobooks
- Primary: ElevenLabs Eleven Turbo v3
- Alternative: SLNG Orpheus
- Why: Best voice quality, emotion control, natural prosody
Batch Transcription
- Primary: SLNG Whisper Large v3
- Alternative: Deepgram Nova 2
- Why: Highest accuracy, multi-language support
IVR & Phone Systems
- Primary: Deepgram Aura
- Why: 50+ voices, professional quality, regional accents
Multilingual Applications
- TTS: ElevenLabs Multilingual v2 or Deepgram Aura (English/Spanish)
- STT: SLNG Whisper or Deepgram Nova
- Why: Broad language coverage, consistent quality
By Performance Requirement
Lowest Latency (sub-100ms)
- ElevenLabs Eleven Flash v2.5
Most Voices (50+)
- Deepgram Aura
Best Transcription Accuracy
- SLNG Whisper Large v3
Real-time Streaming
- Deepgram Nova 2 (STT)
- Deepgram Aura (TTS)
Emotion Control
- SLNG Orpheus
By Language Support
English Only
- SLNG Orpheus
- All models support English
English + Spanish
- Deepgram Aura
Multiple Languages
- ElevenLabs Eleven Multilingual
- SLNG Whisper
- Deepgram Nova
Protocol Support
All models support:
- HTTP - Simple request/response for batch processing
- WebSocket - Real-time bidirectional streaming
Some models also support:
- Server-Sent Events (SSE) - Progressive response streaming
See Protocol Comparison for detailed information.
Pricing Overview
Pricing varies by model and provider:
- TTS Models: Typically charged per 1,000 characters
- STT Models: Typically charged per minute of audio
For exact pricing, contact [email protected] or check your dashboard.
Adding New Models
We continuously adds new models and providers. Check back regularly for updates, or subscribe to our newsletter for announcements. And let us know if there is a specific model or provider you'd like to see supported!
Coming Soon:
- Additional language variants
- More voice options
- Enhanced real-time features