SLNG.AI API

Documentation

Endpoint:https://api.slng.ai

Text to Speech [TTS]

POST
https://api.slng.ai
/v1/tts

Generate audio from text using selected voice model.

Text to Speech [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

Text to Speech [TTS]Request Body

  • textstring · required
  • modelstring
  • voicestring

Text to Speech [TTS]Responses

string · binary

VUI [TTS] (Default/USA)

POST
https://api.slng.ai
/v1/tts/vui

Generate audio from text using VUI voice model (default region: USA).

VUI [TTS] (Default/USA)Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

VUI [TTS] (Default/USA)Request Body

  • textstring · required

    The text to convert to speech

  • voicestring

    Voice to use for synthesis (optional)

    Default: default
  • streamboolean

    Whether to stream the audio response

    Default: false
  • asyncboolean

    Whether to use async prediction (returns prediction_id)

    Default: false

VUI [TTS] (Default/USA)Responses

string · binary

Orpheus [TTS]

POST
https://api.slng.ai
/v1/tts/orpheus

Generate audio from text using Orpheus voice model via Baseten API. Optimized with TRT-LLM on H100 MIG 40GB hardware. Generates ~83 tokens/second for real-time streaming. Audio format: 24kHz, 16-bit, mono WAV.

Orpheus [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

Orpheus [TTS]Request Body

  • promptstring · required

    The text to convert to speech

  • voicestring

    Voice to use - English: 'tara', 'leah', 'jess', 'leo', 'dan', 'mia', 'zac', 'zoe'; French: 'pierre', 'amelie', 'marie'; German: 'jana', 'thomas', 'max'; etc.

    Example: tara
    Default: tara
  • max_tokensnumber

    Maximum tokens to generate

    Default: 2000
  • streamboolean

    Whether to stream the response

    Default: false
  • asyncboolean

    Whether to run asynchronously

    Default: false
  • output_languagestring

    Language code - English: 'en' (high quality); French: 'fr' (high quality); German: 'de' (high quality); Korean: 'ko' (high quality); Mandarin: 'zh' (high quality); Spanish: 'es' (medium); Italian: 'it' (medium); Hindi: 'hi' (medium)

    Example: en
  • output_stylestring

    Style of speech (e.g., 'cheerful', 'serious', 'excited')

Orpheus [TTS]Responses

string · binary

Koroko [TTS]

POST
https://api.slng.ai
/v1/tts/koroko

Generate audio from text using Koroko, a frontier TTS model with just 82 million parameters. Offers efficient and high-quality speech synthesis. Audio format: 16-bit WAV.

Koroko [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

Koroko [TTS]Request Body

  • textstring · required

    The text to convert to speech

  • voicestring

    Voice to use (if supported by the model)

  • streamboolean

    Whether to stream the response

    Default: false
  • asyncboolean

    Whether to run asynchronously

    Default: false

Koroko [TTS]Responses

string · binary

XTTS-V2 [TTS]

POST
https://api.slng.ai
/v1/tts/xtts-v2

Generate audio from text using XTTS-V2 voice model with voice cloning capabilities in multiple languages. XTTS-V2 is a state-of-the-art text-to-speech model by Coqui. Audio format: WAV (16-bit, 24kHz).

XTTS-V2 [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

XTTS-V2 [TTS]Request Body

  • textstring · required

    The text to convert to speech

  • speaker_voicestring · required

    Base64 encoded audio file for voice cloning (6+ seconds recommended)

  • languagestring

    Target language code - English: 'en', Spanish: 'es', French: 'fr', German: 'de', Italian: 'it', Portuguese: 'pt', Polish: 'pl', Turkish: 'tr', Russian: 'ru', Dutch: 'nl', Czech: 'cs', Arabic: 'ar', Chinese: 'zh', Japanese: 'ja', Korean: 'ko', Hungarian: 'hu', Hindi: 'hi'

    Example: en
    Default: en
  • streamboolean

    Whether to stream the response

    Default: false
  • asyncboolean

    Whether to run asynchronously

    Default: false

XTTS-V2 [TTS]Responses

string · binary

MARS6 [TTS]

POST
https://api.slng.ai
/v1/tts/mars6

Generate audio from text using MARS6 voice model with voice/prosody cloning capabilities in 10 languages. MARS6 is a frontier text-to-speech model by CAMB.AI. Audio format: AAC (adts stream) or FLAC, depending on stream_format parameter.

MARS6 [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

MARS6 [TTS]Request Body

  • textstring · required

    The text to convert to speech

  • audio_refstring · required

    Base64 encoded audio file for voice cloning (6-90 seconds recommended)

  • languagestring · required

    Target language code - English: 'en-us', French: 'fr-fr', German: 'de-de', Spanish: 'es-es', Italian: 'it-it', Portuguese: 'pt-pt', Chinese: 'zh-cn', Japanese: 'ja-jp', Korean: 'ko-kr', Dutch: 'nl-nl'

    Example: en-us
  • ref_textstring

    Text transcript of the reference audio (optional but recommended)

  • streamboolean

    Whether to stream the response

    Default: true
  • stream_formatstring · enum

    Format for streaming: 'adts' for AAC or 'flac' for FLAC

    Enum values:
    adts
    flac
    Default: adts
  • temperaturenumber

    Temperature for generation

    Default: 0.7
  • top_pnumber

    Top-p for generation

    Default: 0.7
  • chunk_lengthnumber

    Text chunk length for splitting long input

    Default: 200
  • max_new_tokensnumber

    Limit on max tokens (0 = unlimited)

    Default: 0
  • repetition_penaltynumber

    Repetition penalty for generation

    Default: 1.5
  • asyncboolean

    Whether to run asynchronously

    Default: false

MARS6 [TTS]Responses

string · binary

ElevenLabs Multi-v2 [TTS]

POST
https://api.slng.ai
/v1/tts/elevenlabs/multi-v2

Generate audio from text using ElevenLabs Multi-v2 voice model. Multilingual model supporting 29+ languages with high-quality natural voices.

ElevenLabs Multi-v2 [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

ElevenLabs Multi-v2 [TTS]Request Body

  • textstring · required

    The text to convert to speech

  • voicestring

    Voice ID or name to use for synthesis

    Default: Rachel
  • voice_idstring

    Alternative parameter name for voice ID

  • languagestring

    Language code (en, es, fr, de, it, pt, pl, hi, ar, zh, ja, ko, etc.)

    Default: en
  • language_codestring

    Alternative parameter name for language code

  • streamboolean

    Whether to stream the audio response

    Default: false
  • formatstring

    Audio format to return (mp3_44100_128, mp3_44100_192, pcm_16000, pcm_22050, pcm_24000, pcm_44100, ulaw_8000)

    Default: mp3_44100_128
  • stabilitynumber · max: 1

    Voice stability (0.0-1.0)

    Default: 0.5
  • similarity_boostnumber · max: 1

    Voice similarity boost (0.0-1.0)

    Default: 0.75
  • stylenumber · max: 1

    Style control (0.0-1.0)

    Default: 0
  • speaking_ratenumber

    Speaking rate multiplier

    Default: 1
  • text_normalizationstring · enum

    Text normalization mode

    Enum values:
    auto
    on
    off
    Default: auto
  • seedinteger

    Random seed for reproducible audio generation

ElevenLabs Multi-v2 [TTS]Responses

string · binary

ElevenLabs Turbo v2.5 [TTS]

POST
https://api.slng.ai
/v1/tts/elevenlabs/turbo-v2-5

Generate audio from text using ElevenLabs Turbo v2.5 voice model. Ultra-fast TTS model with low latency, ideal for real-time applications.

ElevenLabs Turbo v2.5 [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

ElevenLabs Turbo v2.5 [TTS]Request Body

  • textstring · required

    The text to convert to speech

  • voicestring

    Voice ID or name to use for synthesis

    Default: Rachel
  • voice_idstring

    Alternative parameter name for voice ID

  • languagestring

    Language code (en, es, fr, de, it, pt)

    Default: en
  • streamboolean

    Whether to stream the audio response

    Default: false
  • formatstring

    Audio format to return

    Default: mp3_44100_128
  • stabilitynumber · max: 1

    Voice stability (0.0-1.0)

    Default: 0.5
  • similarity_boostnumber · max: 1

    Voice similarity boost (0.0-1.0)

    Default: 0.75
  • speaking_ratenumber

    Speaking rate multiplier

    Default: 1
  • text_normalizationstring · enum

    Text normalization mode

    Enum values:
    auto
    on
    off
    Default: auto

ElevenLabs Turbo v2.5 [TTS]Responses

string · binary

ElevenLabs v3 [TTS]

POST
https://api.slng.ai
/v1/tts/elevenlabs/v3

Generate audio from text using ElevenLabs v3 voice model. Latest generation model with high-quality natural voices.

ElevenLabs v3 [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

ElevenLabs v3 [TTS]Request Body

  • textstring · required

    The text to convert to speech

  • voicestring

    Voice ID or name to use for synthesis

    Default: Rachel
  • voice_idstring

    Alternative parameter name for voice ID

  • languagestring

    Language code (en, es, fr, de, it, pt, pl, hi, etc.)

    Default: en
  • streamboolean

    Whether to stream the audio response

    Default: false
  • formatstring

    Audio format to return

    Default: mp3_44100_128
  • stabilitynumber · max: 1

    Voice stability (0.0-1.0)

    Default: 0.5
  • similarity_boostnumber · max: 1

    Voice similarity boost (0.0-1.0)

    Default: 0.75
  • speaking_ratenumber

    Speaking rate multiplier

    Default: 1
  • text_normalizationstring · enum

    Text normalization mode

    Enum values:
    auto
    on
    off
    Default: auto
  • seedinteger

    Random seed for reproducible audio generation

ElevenLabs v3 [TTS]Responses

string · binary

ElevenLabs TTV v3 [TTS]

POST
https://api.slng.ai
/v1/tts/elevenlabs/ttv-v3

Generate audio from text using ElevenLabs TTV v3 voice model. Optimized for synchronized text-to-video applications.

ElevenLabs TTV v3 [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

ElevenLabs TTV v3 [TTS]Request Body

  • textstring · required

    The text to convert to speech

  • voicestring

    Voice ID or name to use for synthesis

    Default: Rachel
  • voice_idstring

    Alternative parameter name for voice ID

  • languagestring

    Language code (en, es, fr, de, it, pt)

    Default: en
  • streamboolean

    Whether to stream the audio response

    Default: false
  • formatstring

    Audio format to return

    Default: mp3_44100_128
  • stabilitynumber · max: 1

    Voice stability (0.0-1.0)

    Default: 0.5
  • similarity_boostnumber · max: 1

    Voice similarity boost (0.0-1.0)

    Default: 0.75
  • speaking_ratenumber

    Speaking rate multiplier

    Default: 1
  • text_normalizationstring · enum

    Text normalization mode

    Enum values:
    auto
    on
    off
    Default: auto
  • seedinteger

    Random seed for reproducible audio generation

ElevenLabs TTV v3 [TTS]Responses

string · binary

ElevenLabs Flash v2.5 [TTS]

POST
https://api.slng.ai
/v1/tts/elevenlabs/flash-v2-5

Generate audio from text using ElevenLabs Flash v2.5 voice model. Fast TTS model with good balance between speed and quality.

ElevenLabs Flash v2.5 [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

ElevenLabs Flash v2.5 [TTS]Request Body

  • textstring · required

    The text to convert to speech

  • voicestring

    Voice ID or name to use for synthesis

    Default: Rachel
  • voice_idstring

    Alternative parameter name for voice ID

  • languagestring

    Language code (en, es, fr, de, it, pt, pl)

    Default: en
  • streamboolean

    Whether to stream the audio response

    Default: false
  • formatstring

    Audio format to return

    Default: mp3_44100_128
  • stabilitynumber · max: 1

    Voice stability (0.0-1.0)

    Default: 0.5
  • similarity_boostnumber · max: 1

    Voice similarity boost (0.0-1.0)

    Default: 0.75
  • speaking_ratenumber

    Speaking rate multiplier

    Default: 1
  • text_normalizationstring · enum

    Text normalization mode

    Enum values:
    auto
    on
    off
    Default: auto

ElevenLabs Flash v2.5 [TTS]Responses

string · binary

VUI [TTS] (India)

POST
https://api.slng.ai
/v1/in/tts/vui

Generate audio from text using VUI voice model (region: India).

VUI [TTS] (India)Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

VUI [TTS] (India)Responses

Rate Limiting Response

  • typestring · required

    A URI reference that identifies the problem.

  • titlestring · required

    A short, human-readable summary of the problem.

  • statusnumber · required

    The HTTP status code.

  • instancestring

VUI [TTS] (EU)

POST
https://api.slng.ai
/v1/eu/tts/vui

Generate audio from text using VUI voice model (region: EU).

VUI [TTS] (EU)Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.


Twi SpeechT5 [TTS]

POST
https://api.slng.ai
/v1/tts/twi-speecht5

Synthesize Twi speech from text using a specified speaker embedding via Modal API.

Twi SpeechT5 [TTS]Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

Twi SpeechT5 [TTS]Request Body

  • textstring · required

    The text to synthesize into Twi speech.

  • speaker_embeddingnumber[] · minItems: 512 · maxItems: 512 · required

    A 512-dimensional speaker embedding vector representing the target voice.

Twi SpeechT5 [TTS]Responses

Synthesized audio waveform (array of floats)

  • audionumber[]

    Raw waveform samples (float array, 16kHz)


MCP Server

POST
https://api.slng.ai
/mcp