VUI [TTS] (Default/USA)
Generate audio from text using VUI voice model (default region: USA).
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to convert to speech
voice
stringVoice to use for synthesis (optional)
Default: defaultstream
booleanWhether to stream the audio response
Default: falseasync
booleanWhether to use async prediction (returns prediction_id)
Default: false
Responses
Orpheus [TTS]
Generate audio from text using Orpheus voice model via Baseten API. Optimized with TRT-LLM on H100 MIG 40GB hardware. Generates ~83 tokens/second for real-time streaming. Audio format: 24kHz, 16-bit, mono WAV.
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
prompt
string · requiredThe text to convert to speech
voice
stringVoice to use - English: 'tara', 'leah', 'jess', 'leo', 'dan', 'mia', 'zac', 'zoe'; French: 'pierre', 'amelie', 'marie'; German: 'jana', 'thomas', 'max'; etc.
Example: taraDefault: taramax_tokens
numberMaximum tokens to generate
Default: 2000stream
booleanWhether to stream the response
Default: falseasync
booleanWhether to run asynchronously
Default: falseoutput_language
stringLanguage code - English: 'en' (high quality); French: 'fr' (high quality); German: 'de' (high quality); Korean: 'ko' (high quality); Mandarin: 'zh' (high quality); Spanish: 'es' (medium); Italian: 'it' (medium); Hindi: 'hi' (medium)
Example: enoutput_style
stringStyle of speech (e.g., 'cheerful', 'serious', 'excited')
Responses
Koroko [TTS]
Generate audio from text using Koroko, a frontier TTS model with just 82 million parameters. Offers efficient and high-quality speech synthesis. Audio format: 16-bit WAV.
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to convert to speech
voice
stringVoice to use (if supported by the model)
stream
booleanWhether to stream the response
Default: falseasync
booleanWhether to run asynchronously
Default: false
Responses
XTTS-V2 [TTS]
Generate audio from text using XTTS-V2 voice model with voice cloning capabilities in multiple languages. XTTS-V2 is a state-of-the-art text-to-speech model by Coqui. Audio format: WAV (16-bit, 24kHz).
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to convert to speech
speaker_voice
string · requiredBase64 encoded audio file for voice cloning (6+ seconds recommended)
language
stringTarget language code - English: 'en', Spanish: 'es', French: 'fr', German: 'de', Italian: 'it', Portuguese: 'pt', Polish: 'pl', Turkish: 'tr', Russian: 'ru', Dutch: 'nl', Czech: 'cs', Arabic: 'ar', Chinese: 'zh', Japanese: 'ja', Korean: 'ko', Hungarian: 'hu', Hindi: 'hi'
Example: enDefault: enstream
booleanWhether to stream the response
Default: falseasync
booleanWhether to run asynchronously
Default: false
Responses
MARS6 [TTS]
Generate audio from text using MARS6 voice model with voice/prosody cloning capabilities in 10 languages. MARS6 is a frontier text-to-speech model by CAMB.AI. Audio format: AAC (adts stream) or FLAC, depending on stream_format parameter.
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to convert to speech
audio_ref
string · requiredBase64 encoded audio file for voice cloning (6-90 seconds recommended)
language
string · requiredTarget language code - English: 'en-us', French: 'fr-fr', German: 'de-de', Spanish: 'es-es', Italian: 'it-it', Portuguese: 'pt-pt', Chinese: 'zh-cn', Japanese: 'ja-jp', Korean: 'ko-kr', Dutch: 'nl-nl'
Example: en-us
ref_text
stringText transcript of the reference audio (optional but recommended)
stream
booleanWhether to stream the response
Default: truestream_format
string · enumFormat for streaming: 'adts' for AAC or 'flac' for FLAC
Enum values:adtsflacDefault: adtstemperature
numberTemperature for generation
Default: 0.7top_p
numberTop-p for generation
Default: 0.7chunk_length
numberText chunk length for splitting long input
Default: 200max_new_tokens
numberLimit on max tokens (0 = unlimited)
Default: 0repetition_penalty
numberRepetition penalty for generation
Default: 1.5async
booleanWhether to run asynchronously
Default: false
Responses
ElevenLabs Multi-v2 [TTS]
Generate audio from text using ElevenLabs Multi-v2 voice model. Multilingual model supporting 29+ languages with high-quality natural voices.
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to convert to speech
voice
stringVoice ID or name to use for synthesis
Default: Rachelvoice_id
stringAlternative parameter name for voice ID
language
stringLanguage code (en, es, fr, de, it, pt, pl, hi, ar, zh, ja, ko, etc.)
Default: enlanguage_code
stringAlternative parameter name for language code
stream
booleanWhether to stream the audio response
Default: falseformat
stringAudio format to return (mp3_44100_128, mp3_44100_192, pcm_16000, pcm_22050, pcm_24000, pcm_44100, ulaw_8000)
Default: mp3_44100_128stability
number · max: 1Voice stability (0.0-1.0)
Default: 0.5similarity_boost
number · max: 1Voice similarity boost (0.0-1.0)
Default: 0.75style
number · max: 1Style control (0.0-1.0)
Default: 0speaking_rate
numberSpeaking rate multiplier
Default: 1text_normalization
string · enumText normalization mode
Enum values:autoonoffDefault: autoseed
integerRandom seed for reproducible audio generation
Responses
ElevenLabs Turbo v2.5 [TTS]
Generate audio from text using ElevenLabs Turbo v2.5 voice model. Ultra-fast TTS model with low latency, ideal for real-time applications.
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to convert to speech
voice
stringVoice ID or name to use for synthesis
Default: Rachelvoice_id
stringAlternative parameter name for voice ID
language
stringLanguage code (en, es, fr, de, it, pt)
Default: enstream
booleanWhether to stream the audio response
Default: falseformat
stringAudio format to return
Default: mp3_44100_128stability
number · max: 1Voice stability (0.0-1.0)
Default: 0.5similarity_boost
number · max: 1Voice similarity boost (0.0-1.0)
Default: 0.75speaking_rate
numberSpeaking rate multiplier
Default: 1text_normalization
string · enumText normalization mode
Enum values:autoonoffDefault: auto
Responses
ElevenLabs v3 [TTS]
Generate audio from text using ElevenLabs v3 voice model. Latest generation model with high-quality natural voices.
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to convert to speech
voice
stringVoice ID or name to use for synthesis
Default: Rachelvoice_id
stringAlternative parameter name for voice ID
language
stringLanguage code (en, es, fr, de, it, pt, pl, hi, etc.)
Default: enstream
booleanWhether to stream the audio response
Default: falseformat
stringAudio format to return
Default: mp3_44100_128stability
number · max: 1Voice stability (0.0-1.0)
Default: 0.5similarity_boost
number · max: 1Voice similarity boost (0.0-1.0)
Default: 0.75speaking_rate
numberSpeaking rate multiplier
Default: 1text_normalization
string · enumText normalization mode
Enum values:autoonoffDefault: autoseed
integerRandom seed for reproducible audio generation
Responses
ElevenLabs TTV v3 [TTS]
Generate audio from text using ElevenLabs TTV v3 voice model. Optimized for synchronized text-to-video applications.
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to convert to speech
voice
stringVoice ID or name to use for synthesis
Default: Rachelvoice_id
stringAlternative parameter name for voice ID
language
stringLanguage code (en, es, fr, de, it, pt)
Default: enstream
booleanWhether to stream the audio response
Default: falseformat
stringAudio format to return
Default: mp3_44100_128stability
number · max: 1Voice stability (0.0-1.0)
Default: 0.5similarity_boost
number · max: 1Voice similarity boost (0.0-1.0)
Default: 0.75speaking_rate
numberSpeaking rate multiplier
Default: 1text_normalization
string · enumText normalization mode
Enum values:autoonoffDefault: autoseed
integerRandom seed for reproducible audio generation
Responses
ElevenLabs Flash v2.5 [TTS]
Generate audio from text using ElevenLabs Flash v2.5 voice model. Fast TTS model with good balance between speed and quality.
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to convert to speech
voice
stringVoice ID or name to use for synthesis
Default: Rachelvoice_id
stringAlternative parameter name for voice ID
language
stringLanguage code (en, es, fr, de, it, pt, pl)
Default: enstream
booleanWhether to stream the audio response
Default: falseformat
stringAudio format to return
Default: mp3_44100_128stability
number · max: 1Voice stability (0.0-1.0)
Default: 0.5similarity_boost
number · max: 1Voice similarity boost (0.0-1.0)
Default: 0.75speaking_rate
numberSpeaking rate multiplier
Default: 1text_normalization
string · enumText normalization mode
Enum values:autoonoffDefault: auto
Responses
VUI [TTS] (India)
Generate audio from text using VUI voice model (region: India).
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Responses
Rate Limiting Response
type
string · requiredA URI reference that identifies the problem.
title
string · requiredA short, human-readable summary of the problem.
status
number · requiredThe HTTP status code.
instance
string
Twi SpeechT5 [TTS]
Synthesize Twi speech from text using a specified speaker embedding via Modal API.
Headers
Authorization
string · requiredThe
Authorization
header is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE
.
Request Body
text
string · requiredThe text to synthesize into Twi speech.
speaker_embedding
number[] · minItems: 512 · maxItems: 512 · requiredA 512-dimensional speaker embedding vector representing the target voice.
Responses
Synthesized audio waveform (array of floats)
audio
number[]Raw waveform samples (float array, 16kHz)