When dealing with Voice AI applications you might see references to different protocols like HTTP, Server-Sent Events (SSE), and WebSocket. Each protocol has its own strengths and trade-offs, making them suitable for different scenarios.
In this guide, we'll break down the differences between these protocols, their use cases, and when to choose each for integrating.
SLNG API supports three integration protocols: HTTP, Server-Sent Events (SSE), and WebSocket. Each has distinct characteristics optimized for different use cases.
Quick Comparison
| Feature | HTTP | SSE | WebSocket |
|---|---|---|---|
| Direction | Request → Response | Server → Client | Bidirectional |
| Latency | Medium (200-500ms) | Low (100-200ms) | Lowest (sub-100ms) |
| Complexity | Simple | Medium | Higher |
| Connection | Per request | Persistent | Persistent |
| Binary Data | ✅ Yes | ❌ No | ✅ Yes |
| Real-time | ❌ No | ✅ One-way | ✅ Bidirectional |
| Best For | Batch processing | Progressive TTS | Voice agents |
HTTP Protocol
Overview
Traditional request/response pattern. Client sends request, waits for complete response.
When to Use
- Batch transcription
- Pre-recorded TTS generation
- Simple integrations
- No real-time requirements
- Stateless operations
Characteristics
Pros:
- Simplest to implement
- No connection management
- Works everywhere
- Easy debugging
- Cacheable responses
Cons:
- Higher latency
- No streaming
- Less efficient for multiple requests
- Waits for complete response
Example: TTS
Code
Example: STT
Code
Server-Sent Events (SSE)
Overview
Server pushes updates to client over a persistent HTTP connection. One-way streaming from server to client.
When to Use
- Progressive TTS generation
- Streaming transcriptions
- Real-time updates
- Simpler than WebSocket
- Don't need client to server streaming
Characteristics
Pros:
- Progressive responses
- Lower latency than HTTP
- Simpler than WebSocket
- Automatic reconnection
- Works through firewalls
Cons:
- One-way only (server → client)
- Text-based (no binary)
- More complex than HTTP
- Connection management needed
Example: Streaming TTS
Code
Client Implementation
Code
WebSocket Protocol
Overview
Bidirectional, full-duplex communication over a persistent connection. Lowest latency, most powerful.
When to Use
- Real-time voice agents
- Interactive applications
- Lowest latency required
- Bidirectional streaming
- Binary data streaming
- Continuous communication
Characteristics
Pros:
- Bidirectional streaming
- Lowest latency (sub-100ms)
- Binary data support
- Most efficient for high-frequency updates
- Real-time feedback
Cons:
- Most complex to implement
- Connection management critical
- Reconnection logic needed
- Firewall/proxy issues
- Stateful connection
Example: TTS WebSocket
Code
Example: STT WebSocket
Code
Protocol Selection Guide
Use HTTP When:
- Processing pre-recorded audio
- Batch transcription of files
- Generating audio for download
- Simple, one-off requests
- Caching is beneficial
Use SSE When:
- Need progressive TTS streaming
- Want simpler than WebSocket
- Only need server → client updates
- Building streaming narration
- Don't need to send updates to server
Use WebSocket When:
- Building voice agents
- Need real-time interaction
- Require lowest latency
- Need bidirectional streaming
- Streaming binary audio data
- Continuous back-and-forth communication
Implementation Best Practices
For All Protocols
- Authentication: Always include API key in headers
- Error Handling: Implement robust error handling
- Rate Limiting: Respect rate limits and implement backoff
- Timeouts: Set appropriate timeouts
- Logging: Log requests for debugging
For SSE
- Reconnection: Implement automatic reconnection
- Event Parsing: Properly parse event stream format
- Buffering: Handle partial messages
- Cleanup: Close connections when done
For WebSocket
- Reconnection Logic: Implement exponential backoff
- Heartbeat/Ping: Keep connection alive
- Message Queuing: Queue messages during reconnection
- State Management: Track connection state
- Binary Handling: Properly handle binary vs text frames
Common Patterns
Pattern: HTTP with Polling (Async Job)
Code
Pattern: SSE with Fallback
Code
Pattern: WebSocket with Reconnection
Code
Performance Characteristics
Latency Comparison (Typical)
- HTTP: 200-500ms (includes full round trip)
- SSE: 100-200ms (persistent connection, progressive)
- WebSocket: 50-100ms (full-duplex, minimal overhead)
Throughput Comparison
- HTTP: ~10-50 requests/second (depends on connection pool)
- SSE: ~1-10 streams/second (limited by concurrent connections)
- WebSocket: ~100+ messages/second (per connection)
Resource Usage
- HTTP: Low (stateless, no persistent connections)
- SSE: Medium (persistent connections, server push)
- WebSocket: Medium-High (persistent, bidirectional)