Concepts
Which Model Should I Use?
You can choose your path depending on how much control you want:
🔊 TTS Models
Endpoint | Model | Description |
---|---|---|
/tts/vui | VUI | Fast, low-latency voice model with natural prosody. Great default. |
/tts/orpheus | Orpheus | Richer expressive tone; good for more emotional or human-like delivery. |
/tts/koroko | Koroko | Multi-language support with speaker control and voice cloning options. |
/tts/xtts/v2 | XTTS v2 | High-fidelity multilingual model with speaker ID, cloning, and accent precision. |
/tts/mars6 | Mars 6 (optional) | Experimental or internal-use model; not always available. |
🎤 STT Models
Endpoint | Model | Description |
---|---|---|
/stt/whisper-v3 | Whisper v3 | Accurate, multilingual transcription with broad language support. |
/stt/whisper-v3-turbo | Whisper Turbo | Cost-effective and faster version for bulk or low-latency tasks. |
/dia/whisperx | WhisperX | Diarization-capable transcription with word-level timestamps. |
📦 Example: Smart-Routed
Code(bash)
📦 Example: Explicit Model
Code(bash)
Last updated: June 2025
Last modified on