Real-time bidirectional streaming for TTS and STT services. WebSockets provide the lowest latency and most interactive experience for voice applications.
Why WebSockets?
Lowest Latency : Sub-100ms response times
Bidirectional : Send and receive data simultaneously
Real-time Feedback : Get immediate responses
Efficient : Single persistent connection
Binary Support : Stream raw audio data
SLNG Unified Protocol
All SLNG WebSocket endpoints use a consistent protocol, regardless of the underlying provider. This ensures easy switching between models without changing your integration.
TTS WebSocket Protocol
Connection
WSS wss://api.slng.ai/v1/tts/{provider}/{model}:{variant}
Available Endpoints:
wss://api.slng.ai/v1/tts/deepgram/aura:2
wss://api.slng.ai/v1/tts/elevenlabs/eleven-flash:2.5
wss://api.slng.ai/v1/tts/slng/canopylabs/orpheus:en
Client → Server Messages
Initialize Session
{
"type" : "init" ,
"config" : {
"encoding" : "linear16" ,
"sample_rate" : 24000 ,
"language" : "en"
}
}
Send Text for Synthesis
{
"type" : "speak" ,
"text" : "Text to convert to speech" ,
"flush" : false
}
Parameters:
text (required): Text to synthesize
flush (optional): Whether to flush remaining audio immediately
Flush Buffer
Forces the server to send any remaining audio:
Clear Buffer
Clears any pending audio in the buffer:
Cancel Generation
Cancels current generation:
Server → Client Messages
Session Ready
{
"type" : "ready" ,
"session_id" : "uuid-here"
}
Audio Data (Binary)
Audio data is sent as binary ArrayBuffer frames. Each frame contains:
Raw PCM audio samples
Encoding as specified in config
Sample rate as specified in config
Flushed Acknowledgment
Error
{
"type" : "error" ,
"code" : "error_code" ,
"message" : "Human-readable error description"
}
Common Error Codes:
auth_error: Invalid or missing API key
config_error: Invalid configuration
rate_limit: Too many requests
provider_error: Upstream provider error
STT WebSocket Protocol
Connection
WSS wss://api.slng.ai/v1/stt/{provider}/{model}:{variant}
Available Endpoints:
wss://api.slng.ai/v1/stt/deepgram/nova:2
wss://api.slng.ai/v1/stt/slng/openai/whisper:large-v3
Client → Server Messages
Initialize Session
{
"type" : "init" ,
"config" : {
"language" : "en" ,
"sample_rate" : 16000 ,
"encoding" : "linear16"
}
}
Send Audio Data (Binary)
After initialization, send audio data as binary WebSocket frames. The audio should be:
Raw PCM samples
Sample rate matching config
Encoding matching config (typically linear16)
// Send raw audio bytes
ws. send (audioBuffer);
Finalize Transcription
Signal that audio stream is complete:
Server → Client Messages
Session Ready
{
"type" : "ready" ,
"session_id" : "uuid-here"
}
Transcript (Interim or Final)
{
"type" : "transcript" ,
"text" : "Transcribed text" ,
"is_final" : false ,
"confidence" : 0.95 ,
"words" : [
{
"text" : "word" ,
"start" : 0.0 ,
"end" : 0.5 ,
"confidence" : 0.98
}
]
}
Fields:
text: Transcribed text
is_final: Whether this is the final transcription
confidence: Overall confidence score (0-1)
words: Word-level timestamps and confidence (optional)
Error
{
"type" : "error" ,
"code" : "error_code" ,
"message" : "Human-readable error description"
}
Complete Examples
JavaScript/TypeScript TTS
const ws = new WebSocket ( "wss://api.slng.ai/v1/tts/deepgram/aura:2" );
ws. onopen = () => {
console. log ( "Connected" );
// Initialize session
ws. send (
JSON . stringify ({
type: "init" ,
config: {
encoding: "linear16" ,
sample_rate: 24000 ,
},
}),
);
// Send text to convert
ws. send (
JSON . stringify ({
type: "speak" ,
text: "Hello! This is a WebSocket TTS example." ,
}),
);
// Flush to get remaining audio
ws. send (
JSON . stringify ({
type: "flush" ,
}),
);
};
ws. onmessage = ( event ) => {
if (event.data instanceof ArrayBuffer ) {
// Handle binary audio data
playAudio (event.data);
} else {
// Handle JSON control messages
const message = JSON . parse (event.data);
console. log ( "Server message:" , message);
if (message.type === "ready" ) {
console. log ( "Session ready:" , message.session_id);
} else if (message.type === "error" ) {
console. error ( "Error:" , message.message);
}
}
};
ws. onerror = ( error ) => {
console. error ( "WebSocket error:" , error);
};
ws. onclose = ( event ) => {
console. log ( "Connection closed:" , event.code, event.reason);
};
// Helper: Play audio from ArrayBuffer
function playAudio ( audioData ) {
const audioContext = new AudioContext ();
const audioBuffer = audioContext. decodeAudioData (audioData);
// ... play the audio
}
Python TTS
import websocket
import json
def on_message (ws, message):
if isinstance (message, bytes ):
# Handle binary audio data
handle_audio(message)
else :
# Handle JSON control messages
data = json.loads(message)
print ( f "Server message: { data } " )
def on_open (ws):
# Initialize session
ws.send(json.dumps({
"type" : "init" ,
"config" : {
"encoding" : "linear16" ,
"sample_rate" : 24000
}
}))
# Send text
ws.send(json.dumps({
"type" : "speak" ,
"text" : "Hello from Python!"
}))
# Flush
ws.send(json.dumps({ "type" : "flush" }))
ws = websocket.WebSocketApp(
"wss://api.slng.ai/v1/tts/deepgram/aura:2" ,
on_message = on_message,
on_open = on_open
)
ws.run_forever()
JavaScript/TypeScript STT
const ws = new WebSocket ( "wss://api.slng.ai/v1/stt/deepgram/nova:2" );
ws. onopen = () => {
console. log ( "Connected" );
// Initialize session
ws. send (
JSON . stringify ({
type: "init" ,
config: {
language: "en" ,
sample_rate: 16000 ,
},
}),
);
// Start capturing audio from microphone
startAudioCapture (( audioChunk ) => {
ws. send (audioChunk); // Send raw audio bytes
});
};
ws. onmessage = ( event ) => {
const message = JSON . parse (event.data);
if (message.type === "transcript" ) {
console. log ( "Transcript:" , message.text);
console. log ( "Is final:" , message.is_final);
console. log ( "Confidence:" , message.confidence);
if (message.is_final) {
// This is the final transcription for this audio segment
displayTranscript (message.text);
} else {
// This is an interim result
showInterimTranscript (message.text);
}
}
};
ws. onclose = () => {
console. log ( "Connection closed" );
stopAudioCapture ();
};
Python STT
import websocket
import json
import pyaudio
def on_message (ws, message):
data = json.loads(message)
if data[ "type" ] == "transcript" :
print ( f "Transcript: { data[ 'text' ] } " )
print ( f "Is final: { data[ 'is_final' ] } " )
print ( f "Confidence: { data[ 'confidence' ] } " )
def on_open (ws):
# Initialize session
ws.send(json.dumps({
"type" : "init" ,
"config" : {
"language" : "en" ,
"sample_rate" : 16000
}
}))
# Stream audio from microphone
audio = pyaudio.PyAudio()
stream = audio.open(
format = pyaudio.paInt16,
channels = 1 ,
rate = 16000 ,
input = True ,
frames_per_buffer = 1024
)
while True :
data = stream.read( 1024 )
ws.send(data, opcode = websocket. ABNF . OPCODE_BINARY )
ws = websocket.WebSocketApp(
"wss://api.slng.ai/v1/stt/deepgram/nova:2" ,
on_message = on_message,
on_open = on_open
)
ws.run_forever()
Interruption & Cancellation (TTS)
For voice agents, handling interruptions is critical. When a user starts speaking, you need to stop the current TTS output immediately.
Interrupt Current Speech
// Cancel server-side generation
ws. send ( JSON . stringify ({ type: "cancel" }));
// Clear local audio buffer
clearAudioQueue ();
// Stop current playback
stopCurrentAudioSource ();
Complete Voice Agent Pattern
class InterruptibleVoiceAgent {
constructor () {
this .tts = new WebSocket ( "wss://api.slng.ai/v1/tts/deepgram/aura:2" );
this .stt = new WebSocket ( "wss://api.slng.ai/v1/stt/deepgram/nova:2" );
this .isAgentSpeaking = false ;
this . initializeTTS ();
this . initializeSTT ();
}
initializeSTT () {
this .stt. onopen = () => {
this .stt. send (
JSON . stringify ({
type: "init" ,
config: { language: "en" , sample_rate: 16000 },
}),
);
};
this .stt. onmessage = ( event ) => {
const msg = JSON . parse (event.data);
if (msg.type === "transcript" && msg.is_final) {
// User spoke - interrupt agent if speaking
if ( this .isAgentSpeaking) {
this . interruptAgent ();
}
// Process user input
this . handleUserInput (msg.text);
}
};
}
initializeTTS () {
this .tts. onopen = () => {
this .tts. send (
JSON . stringify ({
type: "init" ,
config: { encoding: "linear16" , sample_rate: 24000 },
}),
);
};
this .tts. onmessage = ( event ) => {
if (event.data instanceof ArrayBuffer ) {
this . playAudio (event.data);
} else {
const msg = JSON . parse (event.data);
if (msg.type === "flushed" ) {
this .isAgentSpeaking = false ;
}
}
};
}
interruptAgent () {
// 1. Cancel server-side TTS
this .tts. send ( JSON . stringify ({ type: "cancel" }));
// 2. Clear local audio queue
this .audioQueue = [];
// 3. Stop current playback
if ( this .currentAudioSource) {
this .currentAudioSource. stop ();
this .currentAudioSource = null ;
}
this .isAgentSpeaking = false ;
}
async handleUserInput ( userText ) {
console. log ( "User said:" , userText);
// Get response from your LLM
const response = await this . generateResponse (userText);
// Speak response
this . speak (response);
}
speak ( text ) {
this .isAgentSpeaking = true ;
this .tts. send (
JSON . stringify ({
type: "speak" ,
text: text,
}),
);
this .tts. send (
JSON . stringify ({
type: "flush" ,
}),
);
}
async generateResponse ( userText ) {
// Your LLM call here
return "I understand you said: " + userText;
}
playAudio ( audioData ) {
// Your audio playback implementation
}
}
// Usage
const agent = new InterruptibleVoiceAgent ();
See complete voice agent examples →
Best Practices
1. Connection Management
Implement Reconnection Logic:
class ResilientWebSocket {
constructor ( url ) {
this .url = url;
this .backoff = 1000 ;
this .maxBackoff = 30000 ;
this . connect ();
}
connect () {
this .ws = new WebSocket ( this .url);
this .ws. onclose = () => {
console. log ( `Reconnecting in ${ this . backoff }ms` );
setTimeout (() => this . connect (), this .backoff);
this .backoff = Math. min ( this .backoff * 2 , this .maxBackoff);
};
this .ws. onopen = () => {
console. log ( "Connected" );
this .backoff = 1000 ; // Reset backoff on successful connection
};
}
}
2. Error Handling
Always handle errors gracefully:
ws. onerror = ( error ) => {
console. error ( "WebSocket error:" , error);
// Notify user, attempt recovery
};
ws. onmessage = ( event ) => {
if ( typeof event.data === "string" ) {
const message = JSON . parse (event.data);
if (message.type === "error" ) {
handleServerError (message.code, message.message);
}
}
};
3. Buffer Management (TTS)
Use flush strategically to control latency vs. quality:
// For low latency: flush frequently
ws. send ( JSON . stringify ({ type: "speak" , text: sentence, flush: true }));
// For better quality: let buffering happen
ws. send ( JSON . stringify ({ type: "speak" , text: paragraph }));
// ... send more text ...
ws. send ( JSON . stringify ({ type: "flush" })); // Flush at the end
4. Audio Format Consistency
Ensure your audio format matches configuration:
// Configuration
{
encoding : 'linear16' , // 16-bit PCM
sample_rate : 16000 // 16kHz
}
// Audio capture must match:
// - 16-bit samples
// - 16000 Hz sample rate
// - Single channel (mono)
5. Heartbeat/Keep-Alive
Implement ping/pong to keep connection alive:
let pingInterval = setInterval (() => {
if (ws.readyState === WebSocket. OPEN ) {
ws. send ( JSON . stringify ({ type: "ping" }));
}
}, 30000 ); // Every 30 seconds
ws. onclose = () => {
clearInterval (pingInterval);
};
Troubleshooting
Connection Drops
Problem : WebSocket disconnects unexpectedly
Solutions:
Implement reconnection logic with exponential backoff
Send periodic heartbeat/ping messages
Check firewall/proxy settings
Verify network stability
Audio Glitches (TTS)
Problem : Choppy or distorted audio playback
Solutions:
Implement proper audio buffering
Use WebAudio API for smooth playback
Check sample rate matches configuration
Ensure sufficient bandwidth
Delayed Transcriptions (STT)
Problem : Transcription results lag behind audio
Solutions:
Reduce audio chunk size for faster processing
Check network latency
Verify audio format matches configuration
Use appropriate model for real-time (Deepgram Nova)
Authentication Failures
Problem : Connection rejected with 401
Solutions:
Include Authorization header in connection request
Verify API key is valid and active
Check key hasn't expired
Ensure proper header format: Authorization: Bearer YOUR_KEY
Performance Tips
Reuse Connections : Keep WebSocket open for multiple operations
Batch Text : Send longer text chunks for better efficiency (TTS)
Buffer Audio : Stream audio in appropriate chunk sizes (STT)
Monitor Latency : Track end-to-end latency and optimize
Local Processing : Handle audio encoding/decoding client-side
Next Steps
Last modified on December 30, 2025