Pronunciation dictionaries

Pronunciation dictionaries let you control how text is spoken before it reaches the selected TTS model. Create a dictionary once, then reference it from HTTP, WebSocket, or Unified TTS requests. Use them when a voice needs to pronounce acronyms, product names, customer names, jargon, or multilingual terms consistently across TTS models.

Pronunciation dictionaries currently support rewrite mode. The SLNG API rewrites matching words or phrases before synthesis, and the rewritten text is what the selected model receives.

How it works

Each dictionary belongs to the organization resolved from your API key. Requests from another organization cannot read or use it. The basic flow is:

Create a dictionary with rewrite rules.
Reference that dictionary by name or dictionary_id.
Send a TTS request with a pronunciation object.

Only one active pronunciation dictionary can apply to a request or WebSocket turn.

Create a dictionary

Create dictionaries with the pronunciation dictionary API reference:

curl -X POST https://api.slng.ai/v1/pronunciation/dictionaries \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "brand-pronunciations",
    "metadata": {
      "language": "hi-IN",
      "providers": ["sarvam", "cartesia"]
    },
    "modes": {
      "rewrite": {
        "rules": [
          { "match": "NAIC", "replace": "en ay eye see" },
          { "match": "B2B", "replace": "bee to bee" }
        ]
      }
    }
  }'

A successful response includes the dictionary id, normalized name, metadata, modes, content hash, and creation timestamp:

{
  "id": "pd_01abc...",
  "org_id": "org_123",
  "name": "brand-pronunciations",
  "normalized_name": "brand-pronunciations",
  "metadata": {
    "language": "hi-IN",
    "providers": ["sarvam", "cartesia"]
  },
  "modes": {
    "rewrite": {
      "rules": [
        { "match": "NAIC", "replace": "en ay eye see" },
        { "match": "B2B", "replace": "bee to bee" }
      ]
    }
  },
  "content_hash": "sha256:...",
  "created_at": "2026-05-15T12:00:00.000Z"
}

Dictionary names must be unique within your organization. Names can contain letters, numbers, ., _, and -, and can be up to 128 characters.

Manage dictionaries

For request and response schemas, see the generated API reference pages for listing dictionaries, reading one dictionary, and deleting a dictionary. List dictionaries:

curl -s https://api.slng.ai/v1/pronunciation/dictionaries \
  -H "Authorization: Bearer YOUR_API_KEY"

Get one dictionary by name:

curl -s https://api.slng.ai/v1/pronunciation/dictionaries/brand-pronunciations \
  -H "Authorization: Bearer YOUR_API_KEY"

Delete a dictionary:

curl -X DELETE https://api.slng.ai/v1/pronunciation/dictionaries/brand-pronunciations \
  -H "Authorization: Bearer YOUR_API_KEY"

Use a dictionary with HTTP TTS

Add a pronunciation object to the TTS request body:

curl -X POST https://api.slng.ai/v1/tts/sarvam/bulbul:v3 \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "NAIC policy check karein aur B2B portal pe login karein",
    "speaker": "shubh",
    "target_language_code": "hi-IN",
    "pronunciation": {
      "mode": "rewrite",
      "name": "brand-pronunciations"
    }
  }'

You can also reference the dictionary by immutable ID:

{
  "pronunciation": {
    "mode": "rewrite",
    "dictionary_id": "pd_01abc..."
  }
}

Rules for the request object:

mode must be "rewrite"
provide exactly one of name or dictionary_id
use only one active dictionary per request

With the example dictionary above, the selected model receives this rewritten text:

en ay eye see policy check karein aur bee to bee portal pe login karein

Use a dictionary with WebSocket TTS

Set a default dictionary when you initialize the session:

{
  "type": "init",
  "config": {
    "pronunciation": {
      "mode": "rewrite",
      "name": "brand-pronunciations"
    }
  }
}

Then send text normally:

{
  "type": "text",
  "text": "NAIC policy check karein aur B2B portal pe login karein"
}

To change dictionaries for a later turn, include pronunciation on the text message:

{
  "type": "text",
  "text": "Use another dictionary for this turn",
  "pronunciation": {
    "mode": "rewrite",
    "name": "finance-pronunciations"
  }
}

init.config.pronunciation sets the session default. text.pronunciation replaces the active dictionary for that turn, and later text turns reuse the most recent active dictionary.

Use a dictionary with Unified TTS

Use the same pronunciation shape with Unified TTS bridge requests:

{
  "type": "init",
  "model": "sarvam/bulbul:v3",
  "voice": "shubh",
  "config": {
    "language": "hi-IN",
    "sample_rate": 24000,
    "encoding": "linear16",
    "pronunciation": {
      "mode": "rewrite",
      "name": "brand-pronunciations"
    }
  }
}

Rewrite matching

Rewrite mode is deterministic:

matching is case-insensitive
only whole words or whole phrases are matched
longer phrases win before shorter matches
rewriting is single-pass and non-recursive

For example, this dictionary prefers B2B portal over the shorter B2B match:

{
  "modes": {
    "rewrite": {
      "rules": [
        { "match": "B2B", "replace": "bee to bee" },
        { "match": "B2B portal", "replace": "bee to bee portal" }
      ]
    }
  }
}

Input:

Use the B2B portal today

Rewritten result:

Use the bee to bee portal today

Limits and errors

Current limits:

Limit	Value
Dictionary name	128 characters
Rewrite rules per dictionary	256
IPA rules per dictionary	256
`match` length	128 characters
`replace` length	256 characters
`ipa` length	256 characters
Runtime text input for rewrite	20,000 characters
Runtime rewritten output	100,000 characters

Pronunciation resolution fails closed. If the dictionary cannot be found or resolved, the request or WebSocket turn is not sent to the selected TTS model. Common HTTP errors:

Status and code	Meaning
`400 invalid_pronunciation`	Malformed object, invalid name, unsupported mode, or dictionary not found
`401 pronunciation_unauthenticated`	Missing or unresolved organization context
`409 pronunciation_conflict`	Dictionary name already exists in the organization
`503 pronunciation_unavailable`	Gateway storage or dependency failure during dictionary resolution

Common WebSocket failures return an error frame:

{
  "type": "error",
  "code": "pronunciation_not_found",
  "message": "Pronunciation dictionary not found: brand-pronunciations",
  "slng_request_id": "..."
}

Current limitations

Only mode: "rewrite" is executable today.
modes.ipa can be stored but is not executed.
There is no automatic fallback from IPA to rewrite.
Provider-native pronunciation dictionary uploads are not supported through the SLNG API.

Recommended pattern

For most applications, create a stable dictionary such as brand-pronunciations and reuse it by name. Use dictionary_id only when your application needs an immutable machine reference. Keep dictionaries scoped to a domain, product line, or voice style. For WebSocket sessions, set the default dictionary in init, then override individual turns only when needed.

Getting Started

SDKs & Tools

Dashboard

Guides

Voices

Examples

Changelog

How it works

Create a dictionary

Manage dictionaries

Use a dictionary with HTTP TTS

Use a dictionary with WebSocket TTS

Use a dictionary with Unified TTS

Rewrite matching

Limits and errors

Current limitations

Recommended pattern

Getting Started

SDKs & Tools

Dashboard

Guides

Voices

Examples

Changelog

Documentation Index

​How it works

​Create a dictionary

​Manage dictionaries

​Use a dictionary with HTTP TTS

​Use a dictionary with WebSocket TTS

​Use a dictionary with Unified TTS

​Rewrite matching

​Limits and errors

​Current limitations

​Recommended pattern

How it works

Create a dictionary

Manage dictionaries

Use a dictionary with HTTP TTS

Use a dictionary with WebSocket TTS

Use a dictionary with Unified TTS

Rewrite matching

Limits and errors

Current limitations

Recommended pattern