Real-time bidirectional streaming for TTS and STT services. WebSockets provide the lowest latency and most interactive experience for voice applications.
Why WebSockets?
Lowest Latency : Sub-100ms response times
Bidirectional : Send and receive data simultaneously
Real-time Feedback : Get immediate responses
Efficient : Single persistent connection
Binary Support : Stream raw audio data
SLNG Protocol
All SLNG WebSocket endpoints use a consistent protocol, regardless of the underlying provider. This ensures easy switching between models without changing your integration.
TTS WebSocket Protocol
Connection
WSS wss://api.slng.ai/v1/tts/{provider}/{model}:{variant}
Available Endpoints:
wss://api.slng.ai/v1/tts/deepgram/aura:2
wss://api.slng.ai/v1/tts/elevenlabs/eleven-flash:2.5
wss://api.slng.ai/v1/tts/slng/canopylabs/orpheus:en
Client → Server Messages
Initialize Session
Initialize a session with model and voice configuration before sending text.
{
"type" : "init" ,
"model" : "aura:2" ,
"voice" : "aura-2-thalia-en" ,
"config" : {
"sample_rate" : 24000 ,
"encoding" : "linear16" ,
"language" : "en" ,
"speed" : 1.0
}
}
Parameters:
Field Type Required Description type"init"Yes Message type modelstringYes Model identifier voicestringNo Voice identifier configobjectNo Session configuration config.sample_ratenumberNo Audio sample rate in Hz config.encoding"linear16" | "mp3" | "opus"No Audio encoding format config.languagestringNo Language code config.speednumberNo Speech speed multiplier
Send Text for Synthesis
Send text to synthesize into audio output.
{
"type" : "text" ,
"text" : "Hello from SLNG." ,
"flush" : false
}
Parameters:
Field Type Required Description type"text"Yes Message type textstringYes Text to synthesize flushbooleanNo Whether to flush remaining audio immediately
Flush Buffer
Force any buffered text/audio to be finalized and delivered.
Clear Buffer
Clear any queued text/audio from the current session.
Cancel Generation
Cancel the current generation and stop any further audio.
Server → Client Messages
Session Ready
Indicates the session is ready to receive messages.
{
"type" : "ready" ,
"session_id" : "session_123"
}
Audio Chunk
Chunk of base64-encoded audio data.
{
"type" : "audio_chunk" ,
"data" : "SGVsbG8gV29ybGQ=" ,
"sequence" : 1
}
Field Type Required Description type"audio_chunk"Yes Message type datastringYes Base64-encoded audio data sequenceintegerNo Sequence number for ordering
Note: Audio may also be delivered as raw binary WebSocket frames for better performance.
Segment Start
Signals the start of a synthesized segment.
{
"type" : "segment_start" ,
"segment_id" : "seg_1"
}
Segment End
Signals the end of a synthesized segment.
{
"type" : "segment_end" ,
"segment_id" : "seg_1"
}
Flushed
Acknowledges that buffered output was flushed.
Cleared
Acknowledges that queued output was cleared.
Audio End
Signals the end of audio generation.
{
"type" : "audio_end" ,
"duration" : 0.12
}
Field Type Required Description type"audio_end"Yes Message type durationnumberNo Audio duration
Error
Indicates an error occurred during synthesis.
{
"type" : "error" ,
"code" : "provider_error" ,
"message" : "Upstream error"
}
Field Type Required Description type"error"Yes Message type codestringYes Error code messagestringYes Human-readable error description
Common Error Codes:
auth_error: Invalid or missing API key
config_error: Invalid configuration
rate_limit: Too many requests
provider_error: Upstream provider error
STT WebSocket Protocol
Connection
WSS wss://api.slng.ai/v1/stt/{provider}/{model}:{variant}
Available Endpoints:
wss://api.slng.ai/v1/stt/deepgram/nova:2
wss://api.slng.ai/v1/stt/slng/openai/whisper:large-v3
Client → Server Messages
Initialize Session
Initialize a session with recognition configuration before streaming audio.
{
"type" : "init" ,
"config" : {
"language" : "en" ,
"sample_rate" : 16000 ,
"encoding" : "linear16" ,
"enable_vad" : true ,
"enable_diarization" : false ,
"enable_word_timestamps" : true ,
"enable_partials" : true
}
}
Parameters:
Field Type Required Description type"init"Yes Message type configobjectNo Recognition configuration config.languagestringNo Language code for recognition config.sample_ratenumberNo Audio sample rate in Hz config.encoding"linear16" | "mp3" | "opus"No Audio encoding format config.enable_vadbooleanNo Enable voice activity detection config.enable_diarizationbooleanNo Enable speaker diarization config.enable_word_timestampsbooleanNo Include word-level timestamps config.enable_partialsbooleanNo Enable partial/interim transcripts
Send Audio Data
Stream an audio frame to be transcribed. After initialization, send audio in one of two formats:
Binary frames (recommended for performance):
Send raw PCM audio samples directly as binary WebSocket frames:
// Send raw audio bytes directly
ws. send (audioBuffer); // ArrayBuffer or Uint8Array
JSON messages with base64-encoded data:
{
"type" : "audio" ,
"data" : "SGVsbG8gV29ybGQ="
}
Field Type Required Description type"audio"Yes Message type datastringYes Base64-encoded audio data
Finalize Transcription
Signal that no more audio frames will be sent.
Server → Client Messages
Session Ready
Indicates the session is ready to receive audio.
{
"type" : "ready" ,
"session_id" : "session_123"
}
Partial Transcript
Interim transcription result.
{
"type" : "partial_transcript" ,
"transcript" : "Hello" ,
"confidence" : 0.91
}
Field Type Required Description type"partial_transcript"Yes Message type transcriptstringYes Transcribed text so far confidencenumberNo Confidence score (0-1)
Final Transcript
Final transcription result with optional metadata.
{
"type" : "final_transcript" ,
"transcript" : "Hello world" ,
"confidence" : 0.97 ,
"language" : "en" ,
"duration" : 2.5
}
Field Type Required Description type"final_transcript"Yes Message type transcriptstringYes Complete transcribed text confidencenumberNo Overall confidence score (0-1) languagestringNo Detected or specified language code durationnumberNo Audio duration
Error
Indicates an error occurred during recognition.
{
"type" : "error" ,
"code" : "provider_error" ,
"message" : "Upstream error"
}
Field Type Required Description type"error"Yes Message type codestringYes Error code messagestringYes Human-readable error description
Complete Examples
JavaScript/TypeScript TTS
const ws = new WebSocket ( "wss://api.slng.ai/v1/tts/deepgram/aura:2" , {
headers: { Authorization: "Bearer YOUR_API_KEY" },
});
ws. onopen = () => {
console. log ( "Connected" );
// Initialize session
ws. send (
JSON . stringify ({
type: "init" ,
model: "aura:2" ,
voice: "aura-2-thalia-en" ,
config: {
encoding: "linear16" ,
sample_rate: 24000 ,
},
}),
);
// Send text to synthesize
ws. send (
JSON . stringify ({
type: "text" ,
text: "Hello! This is a WebSocket TTS example." ,
}),
);
// Flush to get remaining audio
ws. send ( JSON . stringify ({ type: "flush" }));
};
ws. onmessage = ( event ) => {
// Handle binary audio data
if (event.data instanceof ArrayBuffer ) {
playAudio (event.data);
return ;
}
// Handle JSON control messages
const message = JSON . parse (event.data);
console. log ( "Server message:" , message.type);
switch (message.type) {
case "ready" :
console. log ( "Session ready:" , message.session_id);
break ;
case "audio_chunk" :
// Base64-encoded audio (alternative to binary)
const audioData = atob (message.data);
playAudio (audioData);
break ;
case "segment_start" :
console. log ( "Segment started:" , message.segment_id);
break ;
case "segment_end" :
console. log ( "Segment ended:" , message.segment_id);
break ;
case "flushed" :
console. log ( "Buffer flushed" );
break ;
case "audio_end" :
console. log ( "Audio complete, duration:" , message.duration);
break ;
case "error" :
console. error ( "Error:" , message.code, message.message);
break ;
}
};
ws. onerror = ( error ) => {
console. error ( "WebSocket error:" , error);
};
ws. onclose = ( event ) => {
console. log ( "Connection closed:" , event.code, event.reason);
};
Python TTS
import websocket
import json
import base64
def on_message (ws, message):
# Handle binary audio data
if isinstance (message, bytes ):
handle_audio(message)
return
# Handle JSON control messages
data = json.loads(message)
msg_type = data.get( "type" )
if msg_type == "ready" :
print ( f "Session ready: { data[ 'session_id' ] } " )
elif msg_type == "audio_chunk" :
# Base64-encoded audio (alternative to binary)
audio_bytes = base64.b64decode(data[ "data" ])
handle_audio(audio_bytes)
elif msg_type == "audio_end" :
print ( f "Audio complete, duration: { data.get( 'duration' ) } " )
elif msg_type == "error" :
print ( f "Error: { data[ 'code' ] } - { data[ 'message' ] } " )
def on_open (ws):
# Initialize session
ws.send(json.dumps({
"type" : "init" ,
"model" : "aura:2" ,
"voice" : "aura-2-thalia-en" ,
"config" : {
"encoding" : "linear16" ,
"sample_rate" : 24000
}
}))
# Send text
ws.send(json.dumps({
"type" : "text" ,
"text" : "Hello from Python!"
}))
# Flush
ws.send(json.dumps({ "type" : "flush" }))
ws = websocket.WebSocketApp(
"wss://api.slng.ai/v1/tts/deepgram/aura:2" ,
header = { "Authorization" : "Bearer YOUR_API_KEY" },
on_message = on_message,
on_open = on_open
)
ws.run_forever()
JavaScript/TypeScript STT
const ws = new WebSocket ( "wss://api.slng.ai/v1/stt/deepgram/nova:2" , {
headers: { Authorization: "Bearer YOUR_API_KEY" },
});
ws. onopen = () => {
console. log ( "Connected" );
// Initialize session
ws. send (
JSON . stringify ({
type: "init" ,
config: {
language: "en" ,
sample_rate: 16000 ,
encoding: "linear16" ,
enable_vad: true ,
enable_partials: true ,
},
}),
);
// Start capturing audio from microphone
startAudioCapture (( audioChunk ) => {
// Option 1: Send as binary (recommended - lower overhead)
ws. send (audioChunk);
// Option 2: Send as base64-encoded JSON
// ws.send(JSON.stringify({
// type: "audio",
// data: btoa(String.fromCharCode(...new Uint8Array(audioChunk)))
// }));
});
};
ws. onmessage = ( event ) => {
const message = JSON . parse (event.data);
switch (message.type) {
case "ready" :
console. log ( "Session ready:" , message.session_id);
break ;
case "partial_transcript" :
console. log ( "Partial:" , message.transcript);
showInterimTranscript (message.transcript);
break ;
case "final_transcript" :
console. log ( "Final:" , message.transcript);
console. log ( "Confidence:" , message.confidence);
displayTranscript (message.transcript);
break ;
case "error" :
console. error ( "Error:" , message.code, message.message);
break ;
}
};
ws. onclose = () => {
console. log ( "Connection closed" );
stopAudioCapture ();
};
// When done, send finalize
function stopRecording () {
ws. send ( JSON . stringify ({ type: "finalize" }));
}
Python STT
import websocket
import json
import base64
import pyaudio
import threading
def on_message (ws, message):
data = json.loads(message)
msg_type = data.get( "type" )
if msg_type == "ready" :
print ( f "Session ready: { data[ 'session_id' ] } " )
elif msg_type == "partial_transcript" :
print ( f "Partial: { data[ 'transcript' ] } " )
elif msg_type == "final_transcript" :
print ( f "Final: { data[ 'transcript' ] } " )
print ( f "Confidence: { data.get( 'confidence' ) } " )
print ( f "Language: { data.get( 'language' ) } " )
elif msg_type == "error" :
print ( f "Error: { data[ 'code' ] } - { data[ 'message' ] } " )
def stream_audio (ws):
audio = pyaudio.PyAudio()
stream = audio.open(
format = pyaudio.paInt16,
channels = 1 ,
rate = 16000 ,
input = True ,
frames_per_buffer = 1024
)
try :
while True :
audio_data = stream.read( 1024 )
# Option 1: Send as binary (recommended - lower overhead)
ws.send(audio_data, opcode = websocket. ABNF . OPCODE_BINARY )
# Option 2: Send as base64-encoded JSON
# ws.send(json.dumps({
# "type": "audio",
# "data": base64.b64encode(audio_data).decode()
# }))
except Exception as e:
print ( f "Audio streaming stopped: { e } " )
finally :
stream.close()
def on_open (ws):
# Initialize session
ws.send(json.dumps({
"type" : "init" ,
"config" : {
"language" : "en" ,
"sample_rate" : 16000 ,
"encoding" : "linear16" ,
"enable_vad" : True ,
"enable_partials" : True
}
}))
# Start audio streaming in a separate thread
audio_thread = threading.Thread( target = stream_audio, args = (ws,))
audio_thread.daemon = True
audio_thread.start()
ws = websocket.WebSocketApp(
"wss://api.slng.ai/v1/stt/deepgram/nova:2" ,
header = { "Authorization" : "Bearer YOUR_API_KEY" },
on_message = on_message,
on_open = on_open
)
ws.run_forever()
Interruption & Cancellation (TTS)
For voice agents, handling interruptions is critical. When a user starts speaking, you need to stop the current TTS output immediately.
Interrupt Current Speech
// Cancel server-side generation
ws. send ( JSON . stringify ({ type: "cancel" }));
// Clear local audio buffer
clearAudioQueue ();
// Stop current playback
stopCurrentAudioSource ();
Complete Voice Agent Pattern
class InterruptibleVoiceAgent {
constructor ( apiKey ) {
this .apiKey = apiKey;
this .isAgentSpeaking = false ;
this .audioQueue = [];
this . initializeTTS ();
this . initializeSTT ();
}
initializeSTT () {
this .stt = new WebSocket ( "wss://api.slng.ai/v1/stt/deepgram/nova:2" , {
headers: { Authorization: `Bearer ${ this . apiKey }` },
});
this .stt. onopen = () => {
this .stt. send (
JSON . stringify ({
type: "init" ,
config: {
language: "en" ,
sample_rate: 16000 ,
encoding: "linear16" ,
enable_vad: true ,
enable_partials: true ,
},
}),
);
};
this .stt. onmessage = ( event ) => {
const msg = JSON . parse (event.data);
if (msg.type === "final_transcript" ) {
// User spoke - interrupt agent if speaking
if ( this .isAgentSpeaking) {
this . interruptAgent ();
}
// Process user input
this . handleUserInput (msg.transcript);
}
};
}
initializeTTS () {
this .tts = new WebSocket ( "wss://api.slng.ai/v1/tts/deepgram/aura:2" , {
headers: { Authorization: `Bearer ${ this . apiKey }` },
});
this .tts. onopen = () => {
this .tts. send (
JSON . stringify ({
type: "init" ,
model: "aura:2" ,
voice: "aura-2-thalia-en" ,
config: { encoding: "linear16" , sample_rate: 24000 },
}),
);
};
this .tts. onmessage = ( event ) => {
// Handle binary audio data
if (event.data instanceof ArrayBuffer ) {
this . playAudio (event.data);
return ;
}
// Handle JSON control messages
const msg = JSON . parse (event.data);
if (msg.type === "audio_chunk" ) {
// Base64-encoded audio (alternative to binary)
this . playAudio ( atob (msg.data));
} else if (msg.type === "audio_end" ) {
this .isAgentSpeaking = false ;
}
};
}
interruptAgent () {
// 1. Cancel server-side TTS
this .tts. send ( JSON . stringify ({ type: "cancel" }));
// 2. Clear local audio queue
this .audioQueue = [];
// 3. Stop current playback
if ( this .currentAudioSource) {
this .currentAudioSource. stop ();
this .currentAudioSource = null ;
}
this .isAgentSpeaking = false ;
}
async handleUserInput ( userText ) {
console. log ( "User said:" , userText);
// Get response from your LLM
const response = await this . generateResponse (userText);
// Speak response
this . speak (response);
}
speak ( text ) {
this .isAgentSpeaking = true ;
this .tts. send (
JSON . stringify ({
type: "text" ,
text: text,
}),
);
this .tts. send ( JSON . stringify ({ type: "flush" }));
}
async generateResponse ( userText ) {
// Your LLM call here
return "I understand you said: " + userText;
}
playAudio ( audioData ) {
// Your audio playback implementation
}
}
// Usage
const agent = new InterruptibleVoiceAgent ( "YOUR_API_KEY" );
See complete voice agent examples →
Best Practices
1. Connection Management
Implement Reconnection Logic:
class ResilientWebSocket {
constructor ( url ) {
this .url = url;
this .backoff = 1000 ;
this .maxBackoff = 30000 ;
this . connect ();
}
connect () {
this .ws = new WebSocket ( this .url);
this .ws. onclose = () => {
console. log ( `Reconnecting in ${ this . backoff }ms` );
setTimeout (() => this . connect (), this .backoff);
this .backoff = Math. min ( this .backoff * 2 , this .maxBackoff);
};
this .ws. onopen = () => {
console. log ( "Connected" );
this .backoff = 1000 ; // Reset backoff on successful connection
};
}
}
2. Error Handling
Always handle errors gracefully:
ws. onerror = ( error ) => {
console. error ( "WebSocket error:" , error);
// Notify user, attempt recovery
};
ws. onmessage = ( event ) => {
if ( typeof event.data === "string" ) {
const message = JSON . parse (event.data);
if (message.type === "error" ) {
handleServerError (message.code, message.message);
}
}
};
3. Buffer Management (TTS)
Use flush strategically to control latency vs. quality:
// For low latency: flush after each sentence
ws. send ( JSON . stringify ({ type: "text" , text: sentence }));
ws. send ( JSON . stringify ({ type: "flush" }));
// For better quality: batch multiple sentences
ws. send ( JSON . stringify ({ type: "text" , text: paragraph }));
// ... send more text ...
ws. send ( JSON . stringify ({ type: "flush" })); // Flush at the end
4. Audio Format Consistency
Ensure your audio format matches configuration:
// Configuration
{
encoding : 'linear16' , // 16-bit PCM
sample_rate : 16000 // 16kHz
}
// Audio capture must match:
// - 16-bit samples
// - 16000 Hz sample rate
// - Single channel (mono)
5. Heartbeat/Keep-Alive
Implement ping/pong to keep connection alive:
let pingInterval = setInterval (() => {
if (ws.readyState === WebSocket. OPEN ) {
ws. send ( JSON . stringify ({ type: "ping" }));
}
}, 30000 ); // Every 30 seconds
ws. onclose = () => {
clearInterval (pingInterval);
};
Troubleshooting
Connection Drops
Problem : WebSocket disconnects unexpectedly
Solutions:
Implement reconnection logic with exponential backoff
Send periodic heartbeat/ping messages
Check firewall/proxy settings
Verify network stability
Audio Glitches (TTS)
Problem : Choppy or distorted audio playback
Solutions:
Implement proper audio buffering
Use WebAudio API for smooth playback
Check sample rate matches configuration
Ensure sufficient bandwidth
Delayed Transcriptions (STT)
Problem : Transcription results lag behind audio
Solutions:
Reduce audio chunk size for faster processing
Check network latency
Verify audio format matches configuration
Use appropriate model for real-time (Deepgram Nova)
Authentication Failures
Problem : Connection rejected with 401
Solutions:
Include Authorization header in connection request
Verify API key is valid and active
Check key hasn't expired
Ensure proper header format: Authorization: Bearer YOUR_KEY
Performance Tips
Reuse Connections : Keep WebSocket open for multiple operations
Batch Text : Send longer text chunks for better efficiency (TTS)
Buffer Audio : Stream audio in appropriate chunk sizes (STT)
Monitor Latency : Track end-to-end latency and optimize
Local Processing : Handle audio encoding/decoding client-side
Next Steps
Last modified on February 12, 2026