SLNG.AI API

Open Source @ SLNG

Endpoint:https://api.slng.ai

VUI

POST
https://api.slng.ai
/v1/tts/slng/vui

Generate audio from text using VUI voice model (default region: USA). High-quality text-to-speech synthesis.

VUIHeaders

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

VUIRequest Body

  • textstring · required

    The text to convert to speech

  • voicestring

    Voice to use for synthesis (optional)

    Default: default
  • streamboolean

    Whether to stream the audio response

    Default: false
  • asyncboolean

    Whether to use async prediction (returns prediction_id)

    Default: false

VUIResponses

Audio response or async prediction ID

string · binary

Orpheus

POST
https://api.slng.ai
/v1/tts/slng/orpheus

Generate audio from text using Orpheus voice model. Optimized with TRT-LLM on H100 MIG 40GB hardware. Generates ~83 tokens/second for real-time streaming. Audio format: 24kHz, 16-bit, mono WAV.

OrpheusHeaders

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

OrpheusRequest Body

  • promptstring · required

    The text to convert to speech

  • voicestring

    Voice to use - English: 'tara', 'leah', 'jess', 'leo', 'dan', 'mia', 'zac', 'zoe'; French: 'pierre', 'amelie', 'marie'; German: 'jana', 'thomas', 'max'; etc.

    Example: tara
    Default: tara
  • max_tokensnumber

    Maximum tokens to generate

    Default: 2000
  • streamboolean

    Whether to stream the response

    Default: false
  • asyncboolean

    Whether to run asynchronously

    Default: false
  • output_languagestring

    Language code - English: 'en' (high quality); French: 'fr' (high quality); German: 'de' (high quality); Korean: 'ko' (high quality); Mandarin: 'zh' (high quality); Spanish: 'es' (medium); Italian: 'it' (medium); Hindi: 'hi' (medium)

    Example: en
  • output_stylestring

    Style of speech (e.g., 'cheerful', 'serious', 'excited')

OrpheusResponses

Audio response or async prediction ID

string · binary

Kokoro

POST
https://api.slng.ai
/v1/tts/slng/kokoro

Generate audio from text using Kokoro, a frontier TTS model with just 82 million parameters. Offers efficient and high-quality speech synthesis. Audio format: 16-bit WAV.

KokoroHeaders

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

KokoroRequest Body

  • textstring · required

    The text to convert to speech

  • voicestring

    Voice to use (if supported by the model)

  • streamboolean

    Whether to stream the response

    Default: false
  • asyncboolean

    Whether to run asynchronously

    Default: false

KokoroResponses

Audio response or async prediction ID

string · binary

XTTS-V2

POST
https://api.slng.ai
/v1/tts/slng/xtts-v2

Generate audio from text using XTTS-V2 voice model with voice cloning capabilities in multiple languages. XTTS-V2 is a state-of-the-art text-to-speech model by Coqui. Audio format: WAV (16-bit, 24kHz).

XTTS-V2Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

XTTS-V2Request Body

  • textstring · required

    The text to convert to speech

  • speaker_voicestring · required

    Base64 encoded audio file for voice cloning (6+ seconds recommended)

  • languagestring

    Target language code - English: 'en', Spanish: 'es', French: 'fr', German: 'de', Italian: 'it', Portuguese: 'pt', Polish: 'pl', Turkish: 'tr', Russian: 'ru', Dutch: 'nl', Czech: 'cs', Arabic: 'ar', Chinese: 'zh', Japanese: 'ja', Korean: 'ko', Hungarian: 'hu', Hindi: 'hi'

    Example: en
    Default: en
  • streamboolean

    Whether to stream the response

    Default: false
  • asyncboolean

    Whether to run asynchronously

    Default: false

XTTS-V2Responses

Audio response or async prediction ID

string · binary

MARS6

POST
https://api.slng.ai
/v1/tts/slng/mars6

Generate audio from text using MARS6 voice model with voice/prosody cloning capabilities in 10 languages. MARS6 is a frontier text-to-speech model by CAMB.AI. Audio format: AAC (adts stream) or FLAC, depending on stream_format parameter.

MARS6Headers

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

MARS6Request Body

  • textstring · required

    The text to convert to speech

  • audio_refstring · required

    Base64 encoded audio file for voice cloning (6-90 seconds recommended)

  • languagestring · required

    Target language code - English: 'en-us', French: 'fr-fr', German: 'de-de', Spanish: 'es-es', Italian: 'it-it', Portuguese: 'pt-pt', Chinese: 'zh-cn', Japanese: 'ja-jp', Korean: 'ko-kr', Dutch: 'nl-nl'

    Example: en-us
  • ref_textstring

    Text transcript of the reference audio (optional but recommended)

  • streamboolean

    Whether to stream the response

    Default: true
  • stream_formatstring · enum

    Format for streaming: 'adts' for AAC or 'flac' for FLAC

    Enum values:
    adts
    flac
    Default: adts
  • temperaturenumber

    Temperature for generation

    Default: 0.7
  • top_pnumber

    Top-p for generation

    Default: 0.7
  • chunk_lengthnumber

    Text chunk length for splitting long input

    Default: 200
  • max_new_tokensnumber

    Limit on max tokens (0 = unlimited)

    Default: 0
  • repetition_penaltynumber

    Repetition penalty for generation

    Default: 1.5
  • asyncboolean

    Whether to run asynchronously

    Default: false

MARS6Responses

Audio response or async prediction ID

string · binary

TWI SpeechT5 TTS

POST
https://api.slng.ai
/v1/tts/slng/twi-speecht5

Synthesize speech from text using TWI SpeechT5 model with customizable 512-dimensional speaker embeddings. Hosted on SLNG infrastructure for low-latency synthesis.

TWI SpeechT5 TTSHeaders

  • Authorizationstring · required

    The Authorization header is used to authenticate with the API using your API key. Value is of the format Bearer YOUR_KEY_HERE.

TWI SpeechT5 TTSRequest Body

  • textstring · required

    The text to synthesize into speech.

  • speaker_embeddingnumber[] · minItems: 512 · maxItems: 512 · required

    A 512-dimensional speaker embedding vector representing the target voice.

TWI SpeechT5 TTSResponses

Synthesized audio waveform (array of floats)

  • audionumber[]

    Raw waveform samples as float array (16kHz sample rate, mono channel). Values typically range from -1.0 to 1.0.

  • fallbackboolean

    True if fallback silent audio was returned due to an error

  • errorstring

    Error message if fallback audio was returned