Models
Overview
This is the complete master list of all AI models supported by slng.ai. Each model is categorized by type (TTS, STT, LLM) with detailed information about versions, pricing, geographic availability, and documentation links.
Note: Geographic availability defaults to USA if no specific region is listed. Premium regions may have limited availability.
🎤 Text-to-Speech (TTS) Models
Whisper Family
No Whisper TTS models currently supported
OpenAI Whisper (STT Only) 🌍 Available in: All Standard Regions
-
Model:
whisper-v3
-
Description: OpenAI's best speech recognition model
-
Documentation: Whisper V3 API
-
Geographic Availability: USA (AWS Virginia)
-
Pricing: $0.006 per minute of audio
-
Languages: en, es, fr, de, it, pt, ru, zh, ja, ko, ar, hi
-
Model:
whisper-v3-turbo
-
Description: Cost-effective and faster version for bulk processing
-
Documentation: Whisper V3 Turbo API
-
Geographic Availability: USA (AWS Virginia)
-
Pricing: $0.006 per minute of audio
-
Languages: en, es, fr, de, it, pt, ru, zh, ja, ko, ar, hi
-
Model:
whisper-v3-large-streaming
-
Description: WebSocket endpoint for real-time streaming transcription
-
Documentation: Whisper V3 Large Streaming API
-
Geographic Availability: USA (AWS Virginia)
-
Pricing: $0.006 per minute of audio
-
Languages: en, es, fr, de, it, pt, ru, zh, ja, ko, ar, hi
-
Model:
openai-whisper
-
Description: Official OpenAI Whisper model (proprietary)
-
Documentation: OpenAI Whisper API
-
Geographic Availability: USA (AWS Virginia)
-
Pricing: $0.006 per minute of audio
-
Languages: en, es, fr, de, it, pt, ru, zh, ja, ko, ar, hi
WhisperX (with Diarization) 🌍 Available in: All Standard Regions
- Model:
whisperx
- Description: Advanced STT with speaker diarization and word alignment
- Documentation: WhisperX API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.006 per minute of audio
- Languages: Auto-detected, supports 99+ languages
- Features: Speaker diarization, word alignment, multiple output formats
Kyutai 🇮🇳 Available in: India (Mumbai) Only
- Model:
kyutai
- Description: High-performance streaming STT for French/English
- Documentation: Kyutai API
- Geographic Availability: India (AWS Mumbai)
- Pricing: 4 credits per minute of audio
- Languages: en, fr
🔊 Text-to-Speech (TTS) Models
VUI 🌍 Available in: All Standard Regions
- Model:
vui
- Description: Fast, reliable baseline TTS model
- Documentation: VUI API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.10 per minute of audio
- Languages: en
- Voices: default
- Features: Streaming, async generation
- Voice Cloning: ✅ Yes (via
speaker_voice
parameter)
Orpheus 🌍 Available in: All Standard Regions
- Model:
orpheus
- Description: Frontier TTS model with emotive speech capabilities
- Documentation: Orpheus API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.30 per minute of audio
- Languages: en, fr, de, ko, zh, es, it, hi
- Voices: tara, leah, jess, leo, dan, mia, zac, zoe
- Features: Multiple languages, emotive tags, streaming, async
- Voice Cloning: ❌ No (pre-built voices only)
Orpheus Indic 🇮🇳 Available in: India (Mumbai) Only
- Model:
orpheus-indic
- Description: Optimized for 8 major Indian languages
- Documentation: Orpheus Indic API
- Geographic Availability: India (AWS Mumbai)
- Pricing: $0.04 per minute of audio
- Languages: hi, ta, te, bn, mr, gu, kn, ml
- Voices: hindi_male, hindi_female, tamil_male, tamil_female, telugu_male, bengali_female, marathi_male, gujarati_female, kannada_male, malayalam_female
- Voice Cloning: ❌ No (pre-built voices only)
Kokoro 🌍 Available in: All Standard Regions
- Model:
koroko
- Description: Efficient 82M parameter TTS model
- Documentation: Kokoro API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.15 per minute of audio
- Languages: en
- Features: Fast generation, base64 output
- Voice Cloning: ❌ No (single voice only)
XTTS-V2 🌍 Available in: All Standard Regions
- Model:
xtts-v2
- Description: High-fidelity multilingual TTS with voice cloning
- Documentation: XTTS-V2 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.50 per minute of audio
- Languages: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh, ja, ko, hu, hi
- Features: Voice cloning, 17 languages, speaker embedding
- Voice Cloning: ✅ Yes (via
speaker_voice
orspeaker_wav
parameter)
MARS6 🌍 Available in: All Standard Regions
- Model:
mars6
- Description: Advanced TTS with voice and prosody cloning
- Documentation: MARS6 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.60 per minute of audio
- Languages: en-us, fr-fr, de-de, es-es, it-it, pt-pt, zh-cn, ja-jp, ko-kr, nl-nl
- Features: Voice cloning, prosody cloning, audio reference required
- Voice Cloning: ✅ Yes (via
audio_ref
parameter, required)
Twi SpeechT5 🌍 Available in: All Standard Regions
- Model:
twi-speecht5
- Description: Specialized TTS model for Twi language
- Documentation: Twi SpeechT5 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.25 per minute of audio
- Languages: tw
- Features: Speaker embedding, voice customization
- Voice Cloning: ✅ Yes (via speaker embedding)
🎭 ElevenLabs Models
ElevenLabs Multi-v2 🌍 Available in: All Standard Regions
- Model:
elevenlabs/multi-v2
- Description: Multilingual model supporting 29+ languages
- Documentation: ElevenLabs Multi-v2 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.30 per minute of audio
- Languages: en, es, fr, de, it, pt, pl, hi, ar, zh, ja, ko
- Voices: Rachel, Drew, Clyde, Paul, Domi, Dave, Fin, Bella
- Voice Cloning: ✅ Yes (via voice cloning API)
ElevenLabs Turbo v2.5 🌍 Available in: All Standard Regions
- Model:
elevenlabs/turbo-v2-5
- Description: Ultra-fast TTS with low latency
- Documentation: ElevenLabs Turbo v2.5 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.25 per minute of audio
- Languages: en, es, fr, de, it, pt
- Voices: Rachel, Drew, Clyde, Paul
- Voice Cloning: ✅ Yes (via voice cloning API)
ElevenLabs v3 🌍 Available in: All Standard Regions
- Model:
elevenlabs/v3
- Description: Latest generation with high-quality voices
- Documentation: ElevenLabs v3 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.35 per minute of audio
- Languages: en, es, fr, de, it, pt, pl, hi
- Voices: Rachel, Drew, Clyde, Paul, Domi, Dave, Fin, Bella
- Voice Cloning: ✅ Yes (via voice cloning API)
ElevenLabs TTV-v3 🌍 Available in: All Standard Regions
- Model:
elevenlabs/ttv-v3
- Description: Text-to-video optimized TTS model
- Documentation: ElevenLabs TTV-v3 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.35 per minute of audio
- Languages: en, es, fr, de, it, pt, pl, hi
- Voices: Rachel, Drew, Clyde, Paul, Domi, Dave, Fin, Bella
- Voice Cloning: ✅ Yes (via voice cloning API)
ElevenLabs Flash v2.5 🌍 Available in: All Standard Regions
- Model:
elevenlabs/flash-v2-5
- Description: Fast model with good quality balance
- Documentation: ElevenLabs Flash v2.5 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.20 per minute of audio
- Languages: en, es, fr, de, it, pt, pl
- Voices: Rachel, Drew, Clyde, Paul
- Voice Cloning: ✅ Yes (via voice cloning API)
ElevenLabs Flash v2 🌍 Available in: All Standard Regions
- Model:
elevenlabs/flash-v2
- Description: Fast TTS model for real-time applications
- Documentation: ElevenLabs Flash v2 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.20 per minute of audio
- Languages: en, es, fr, de, it, pt, pl
- Voices: Rachel, Drew, Clyde, Paul
- Voice Cloning: ✅ Yes (via voice cloning API)
ElevenLabs Turbo v2 🌍 Available in: All Standard Regions
- Model:
elevenlabs/turbo-v2
- Description: High-speed TTS with quality optimization
- Documentation: ElevenLabs Turbo v2 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.25 per minute of audio
- Languages: en, es, fr, de, it, pt, pl
- Voices: Rachel, Drew, Clyde, Paul
- Voice Cloning: ✅ Yes (via voice cloning API)
🧠 Large Language Models (LLM)
Llama 4 Scout
- Model:
llama-4-scout
- Description: Fast, capable 17B parameter model
- Documentation: Llama 4 Scout API
- Geographic Availability: USA (AWS Virginia)
- Pricing: $0.10 per 1M tokens
- Features: Text generation, conversation, instruction following
Kimi K2
- Model:
kimi-k2
- Description: Advanced language model for complex reasoning
- Documentation: Kimi K2 API
- Geographic Availability: USA (AWS Virginia)
- Pricing: Contact for pricing
- Features: Advanced reasoning, multilingual support
🌍 Geographic Availability Summary
Standard Regions (Full Model Availability)
- USA East (
us-east-1
): All models - USA West (
us-west-1
): All models - Europe Central (
eu-central-1
): All models - Europe West 3 (
eu-west-3
): All models - Europe West 4 (
europe-west4
): All models - Europe West 1 (
eu-west-1
): All models - Asia South 1 (
ap-south-1
): All models - Asia Southeast 1 (
asia-southeast1
): All models
Premium Regions (Limited Availability)
- Canada Central (
ca-central-1
): Contact for availability - Europe West 2 (
eu-west-2
): Contact for availability - South America East 1 (
sa-east-1
): Contact for availability - Middle East West 1 (
me-west1
): Contact for availability
🌍 Want a Model in Additional Regions?
Need a specific model available in a different region? We're constantly expanding our global infrastructure to serve customers worldwide.
Submit your region request here: Request New Region Access
Include in your request:
- Which model(s) you need
- Your target region(s)
- Your use case and compliance requirements
- Expected monthly usage volume
We'll evaluate your request and get back to you within 2 business days.
🎭 Voice Cloning Summary
Models with Voice Cloning ✅
- VUI: Basic voice cloning via
speaker_voice
parameter - XTTS-V2: Advanced voice cloning in 17 languages
- MARS6: Voice + prosody cloning (requires
audio_ref
) - Twi SpeechT5: Speaker embedding and voice customization
- All ElevenLabs Models: Professional voice cloning API
Models without Voice Cloning ❌
- Orpheus: Pre-built voices only (tara, leah, jess, etc.)
- Orpheus Indic: Pre-built Indian language voices
- Kokoro: Single voice only
💰 Pricing Summary
TTS Models (per minute of audio)
- VUI: $0.10 (most affordable)
- Kokoro: $0.15
- ElevenLabs Flash v2.5: $0.20
- ElevenLabs Turbo v2.5: $0.25
- Twi SpeechT5: $0.25
- Orpheus: $0.30
- ElevenLabs Multi-v2: $0.30
- ElevenLabs v3: $0.35
- XTTS-V2: $0.50
- MARS6: $0.60 (most expensive)
STT Models (per minute of audio)
- Whisper models: $0.006
- Kyutai: 4 credits
LLM Models (per 1M tokens)
- Llama 4 Scout: $0.10
- Kimi K2: Contact for pricing
🔗 Quick Links
Last updated: June 2025