Orpheus
Generate audio from text using Orpheus voice model. English text-to-speech with 8 high-quality voices and emotion tag support. Real-time streaming supported. Audio format: 16-24kHz, 16-bit, mono WAV. Supports emotion tags like
Headers
Authorizationstring · requiredThe
Authorizationheader is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE.
Request Body
promptstring · requiredThe text to convert to speech
voicestring · enumVoice to use. Available voices: 'tara' (female, conversational, clear), 'leah' (female, warm, gentle), 'jess' (female, energetic, youthful), 'leo' (male, authoritative, deep), 'dan' (male, friendly, casual), 'mia' (female, professional, articulate), 'zac' (male, enthusiastic, dynamic), 'zoe' (female, calm, soothing)
Enum values:taraleahjessleodanmiazaczoeDefault: taramax_tokensnumberMaximum tokens to generate
Default: 2000streambooleanWhether to stream the response
Default: false
Responses
Audio response in WAV format
Aura2 EN
Deepgram Aura-2 English text-to-speech via SLNG Modal deployment. Streams natural voices with Deepgram's Aura-2 engine.
Headers
Authorizationstring · requiredThe
Authorizationheader is used to authenticate with the API using your API key. Value is of the formatBearer YOUR_KEY_HERE.
Request Body
textstring · requiredText to convert to speech
modelstringAura-2 model identifier (voice). Default aura-2-thalia-en
voicestringAlias for model parameter
encodingstringAudio encoding (linear16, mp3, opus)
Default: linear16containerstringAudio container (wav, mp3, none)
Default: nonesample_rateintegerSample rate in Hz
Default: 24000
Responses
Audio response