When dealing with Voice AI applications you might see references to different protocols like HTTP and WebSocket. Each protocol has its own strengths and trade-offs, making them suitable for different scenarios.
In this guide, we'll break down the differences between these protocols, their use cases, and when to choose each for integrating.
SLNG API supports two integration protocols: HTTP and WebSocket. Each has distinct characteristics optimized for different use cases.
Quick Comparison
| Feature | HTTP | WebSocket |
|---|---|---|
| Direction | Request → Response | Bidirectional |
| Latency | Medium (200-500ms) | Lowest (sub-100ms) |
| Complexity | Simple | Higher |
| Connection | Per request | Persistent |
| Binary Data | Yes | Yes |
| Real-time | No | Yes |
| Best For | Batch processing | Voice agents |
HTTP Protocol
Overview
Traditional request/response pattern. Client sends request, waits for complete response.
When to Use
- Batch transcription
- Pre-recorded TTS generation
- Simple integrations
- No real-time requirements
- Stateless operations
Characteristics
Pros:
- Simplest to implement
- No connection management
- Works everywhere
- Easy debugging
- Cacheable responses
Cons:
- Higher latency
- No streaming
- Less efficient for multiple requests
- Waits for complete response
Example: TTS
Code
Example: STT
Code
WebSocket Protocol
Overview
Bidirectional, full-duplex communication over a persistent connection. Lowest latency, most powerful.
When to Use
- Real-time voice agents
- Interactive applications
- Lowest latency required
- Bidirectional streaming
- Binary data streaming
- Continuous communication
Characteristics
Pros:
- Bidirectional streaming
- Lowest latency (sub-100ms)
- Binary data support
- Most efficient for high-frequency updates
- Real-time feedback
Cons:
- More complex to implement
- Connection management critical
- Reconnection logic needed
- Firewall/proxy issues possible
- Stateful connection
Example: TTS WebSocket
Code
Example: STT WebSocket
Code
Protocol Selection Guide
Use HTTP When:
- Processing pre-recorded audio
- Batch transcription of files
- Generating audio for download
- Simple, one-off requests
- Caching is beneficial
Use WebSocket When:
- Building voice agents
- Need real-time interaction
- Require lowest latency
- Need bidirectional streaming
- Streaming binary audio data
- Continuous back-and-forth communication
Implementation Best Practices
For All Protocols
- Authentication: Always include API key in headers
- Error Handling: Implement robust error handling
- Rate Limiting: Respect rate limits and implement backoff
- Timeouts: Set appropriate timeouts
- Logging: Log requests for debugging
For WebSocket
- Reconnection Logic: Implement exponential backoff
- Heartbeat/Ping: Keep connection alive
- Message Queuing: Queue messages during reconnection
- State Management: Track connection state
- Binary Handling: Properly handle binary vs text frames
Common Patterns
Pattern: HTTP with Polling (Async Job)
Code
Pattern: WebSocket with Reconnection
Code
Performance Characteristics
Latency Comparison (Typical)
- HTTP: 200-500ms (includes full round trip)
- WebSocket: 50-100ms (full-duplex, minimal overhead)
Throughput Comparison
- HTTP: ~10-50 requests/second (depends on connection pool)
- WebSocket: ~100+ messages/second (per connection)
Resource Usage
- HTTP: Low (stateless, no persistent connections)
- WebSocket: Medium-High (persistent, bidirectional)
Next Steps
Last modified on