{
"type": "init",
"model": "bulbul:v3",
"voice": "shubh",
"config": {
"language": "en-IN",
"speech_sample_rate": "24000",
"encoding": "linear16",
"pace": 1,
"temperature": 0.6
}
}{
"type": "text",
"text": "Hello, this is a test of text-to-speech synthesis."
}{
"type": "flush"
}{
"type": "clear"
}{
"type": "close"
}{
"type": "keepalive"
}{
"type": "ready",
"session_id": "sess_tts_abc123"
}{
"type": "audio",
"data": {
"request_id": "20260320_9a16651d-38fa-430a-b688-7d0bea33972f",
"content_type": "audio/pcm",
"audio": "KAVWBYAFcgVtBUE..."
}
}{
"type": "segment_start",
"segment_id": "seg_001"
}{
"type": "segment_end",
"segment_id": "seg_001"
}{
"type": "flushed"
}{
"type": "cleared"
}{
"type": "audio_end",
"duration": 3.5
}{
"type": "error",
"code": "provider_error",
"message": "Provider returned an unexpected error"
}Bulbul v3
Stream multilingual Indian-language speech from Sarvam AI Bulbul v3 over WebSocket with 30+ speaker voices and SLNG’s unified low-latency TTS protocol.
{
"type": "init",
"model": "bulbul:v3",
"voice": "shubh",
"config": {
"language": "en-IN",
"speech_sample_rate": "24000",
"encoding": "linear16",
"pace": 1,
"temperature": 0.6
}
}{
"type": "text",
"text": "Hello, this is a test of text-to-speech synthesis."
}{
"type": "flush"
}{
"type": "clear"
}{
"type": "close"
}{
"type": "keepalive"
}{
"type": "ready",
"session_id": "sess_tts_abc123"
}{
"type": "audio",
"data": {
"request_id": "20260320_9a16651d-38fa-430a-b688-7d0bea33972f",
"content_type": "audio/pcm",
"audio": "KAVWBYAFcgVtBUE..."
}
}{
"type": "segment_start",
"segment_id": "seg_001"
}{
"type": "segment_end",
"segment_id": "seg_001"
}{
"type": "flushed"
}{
"type": "cleared"
}{
"type": "audio_end",
"duration": 3.5
}{
"type": "error",
"code": "provider_error",
"message": "Provider returned an unexpected error"
}API key issued by SLNG. Pass as Authorization: Bearer <token> in the WebSocket upgrade request headers.
GET
Target world part override. Auto-selected if not provided. Available world parts: ap.
apInitialize a Sarvam Bulbul TTS session with provider-specific configuration.
Send text to synthesize into audio output.
Force any buffered text/audio to be finalized and delivered.
Clear any queued text/audio from the current session.
Close the session and stop any further audio.
Keep the WebSocket connection alive during silence.
Indicates the session is ready to receive messages.
Audio response from Sarvam Bulbul TTS with metadata.
Signals the start of a synthesized segment.
Signals the end of a synthesized segment.
Acknowledges that buffered output was flushed.
Acknowledges that queued output was cleared.
Signals the end of audio generation.
Indicates an error occurred during synthesis.
Was this page helpful?