Inworld Max 1.5

Authorizations

Authorization

string

header

required

API key issued by SLNG. Pass as Authorization: Bearer <token>.

Headers

X-Region-Override

enum<string>

Target region override. Auto-selected if not provided.

Available options:

us-east-1

Body

application/json

Inworld Max 1.5 synthesis request, SLNG-hosted. Audio output is configured with flat fields (encoding, sample_rate, bit_rate, speaking_rate) that the gateway maps to Inworld's audioConfig.

text

string

required

Text to synthesize. Max 2,000 characters.

Required string length: 1 - 2000

voice

string

default:Ashley

Inworld voice ID. Voices are multilingual — each has a native language but can speak any supported language via the language parameter (best results when language matches the voice's native language). See the full catalog at https://docs.slng.ai/voices/inworld.

modelId

enum<string>

default:inworld-tts-1.5-max

ID of the Inworld TTS model.

Available options:

inworld-tts-1.5-max

language

enum<string>

BCP-47 language tag specifying the language the voice should speak the text in. Optional — when omitted, Inworld uses the voice's original prompt and auto-detects the language from the text. Voices are multilingual; best results when this matches the voice's native language. These are the 15 core languages supported by inworld-tts-1.5-max (inworld-tts-2 additionally supports ~85 experimental languages). An invalid code returns an error.

Available options:

ar-SA,

de-DE,

en-US,

es-ES,

fr-FR,

he-IL,

hi-IN,

it-IT,

ja-JP,

ko-KR,

nl-NL,

pl-PL,

pt-BR,

ru-RU,

zh-CN

deliveryMode

enum<string>

Only applies to inworld-tts-2. Controls output variation; ignored on Max 1.5.

Available options:

DELIVERY_MODE_UNSPECIFIED,

STABLE,

BALANCED,

CREATIVE

temperature

number

default:1

Higher values produce more expressive output; lower values more deterministic. Range (0, 2].

Required range: 0 < x <= 2

timestampType

enum<string>

default:TIMESTAMP_TYPE_UNSPECIFIED

Controls timestamp metadata returned with the audio. Adds latency.

Available options:

TIMESTAMP_TYPE_UNSPECIFIED,

WORD,

CHARACTER

applyTextNormalization

enum<string>

default:APPLY_TEXT_NORMALIZATION_UNSPECIFIED

Expands numbers, dates, and abbreviations before synthesis. Disabling may reduce latency.

Available options:

APPLY_TEXT_NORMALIZATION_UNSPECIFIED,

ON,

OFF

encoding

enum<string>

default:MP3

Output audio format. Maps to Inworld audioConfig.audioEncoding.

Available options:

LINEAR16,

MP3,

OGG_OPUS,

ALAW,

MULAW,

FLAC,

PCM,

WAV

sample_rate

enum<integer>

default:48000

Output sample rate in Hz. Maps to Inworld audioConfig.sampleRateHertz.

Available options:

8000,

16000,

22050,

24000,

32000,

44100,

48000

bit_rate

integer

Bits per second. Only applies to compressed formats (MP3, OGG_OPUS). Maps to Inworld audioConfig.bitRate.

speaking_rate

number

default:1

Playback speed. Values below 0.8 not recommended for quality. Maps to Inworld audioConfig.speakingRate.

Required range: 0.5 <= x <= 1.5

Response

Synthesis successful. Unlike other SLNG TTS models (which return raw audio bytes), Inworld Max responds with JSON: the request is SLNG-normalized but the response is Inworld's native body passed through unchanged, carrying base64-encoded audio plus usage and optional timestamp metadata.

Inworld Max 1.5 synthesis result. Inworld's native response, passed through by the gateway.

audioContent

string<byte>

Base64-encoded audio in the requested encoding. Max 16MB.

usage

object

Synthesis usage details.

Show child attributes

timestampInfo

object

Timestamp metadata. Present only when timestampType is WORD or CHARACTER.

Show child attributes

Overview

Unified API

Text-to-Speech

Speech-to-Text

Voice Agents

Batch

Account

Bridges

Authorizations

Headers

Body

Response