{
"type": "init",
"model": "sarvam/bulbul:v3",
"voice": "shubh",
"config": {
"sample_rate": 24000,
"encoding": "linear16"
}
}{
"type": "text",
"text": "Hello, this is a test of text-to-speech synthesis."
}{
"type": "flush"
}{
"type": "clear"
}{
"type": "cancel"
}{
"type": "ready",
"session_id": "sess_tts_abc123"
}{
"type": "audio",
"data": {
"request_id": "20260320_9a16651d-38fa-430a-b688-7d0bea33972f",
"content_type": "audio/pcm",
"audio": "KAVWBYAFcgVtBUE..."
}
}{
"type": "segment_start",
"segment_id": "seg_001"
}{
"type": "segment_end",
"segment_id": "seg_001"
}{
"type": "flushed"
}{
"type": "cleared"
}{
"type": "audio_end",
"duration": 3.5
}{
"type": "error",
"code": "provider_error",
"message": "Provider returned an unexpected error"
}Text-to-Speech API for generating speech from text using Sarvam AI Bulbul. High-quality multilingual TTS for Indian languages with 30+ speaker voices. Establishes a WebSocket connection for real-time text-to-speech using the unified SLNG TTS protocol.
{
"type": "init",
"model": "sarvam/bulbul:v3",
"voice": "shubh",
"config": {
"sample_rate": 24000,
"encoding": "linear16"
}
}{
"type": "text",
"text": "Hello, this is a test of text-to-speech synthesis."
}{
"type": "flush"
}{
"type": "clear"
}{
"type": "cancel"
}{
"type": "ready",
"session_id": "sess_tts_abc123"
}{
"type": "audio",
"data": {
"request_id": "20260320_9a16651d-38fa-430a-b688-7d0bea33972f",
"content_type": "audio/pcm",
"audio": "KAVWBYAFcgVtBUE..."
}
}{
"type": "segment_start",
"segment_id": "seg_001"
}{
"type": "segment_end",
"segment_id": "seg_001"
}{
"type": "flushed"
}{
"type": "cleared"
}{
"type": "audio_end",
"duration": 3.5
}{
"type": "error",
"code": "provider_error",
"message": "Provider returned an unexpected error"
}API key issued by SLNG. Pass as Authorization: Bearer <token> in the WebSocket upgrade request headers.
GET
Target world part override. Auto-selected if not provided. Available world parts: ap.
apInitialize a TTS session with model and voice configuration.
Send text to synthesize into audio output.
Force any buffered text/audio to be finalized and delivered.
Clear any queued text/audio from the current session.
Cancel the current generation and stop any further audio.
Indicates the session is ready to receive messages.
Audio response from Sarvam Bulbul TTS with metadata.
Signals the start of a synthesized segment.
Signals the end of a synthesized segment.
Acknowledges that buffered output was flushed.
Acknowledges that queued output was cleared.
Signals the end of audio generation.
Indicates an error occurred during synthesis.
Was this page helpful?