At a Glance
| HTTP | WebSocket | |
|---|---|---|
| Flow | Request → wait → complete response | Open connection → stream both ways |
| Latency | 200–500 ms | Sub-100 ms |
| Best for | Batch jobs, file conversion | Voice agents, live transcription |
| Complexity | One curl call | Connection lifecycle to manage |
How Each Protocol Works
- HTTP
- WebSocket
You send a request and get back the full result once processing finishes.Use HTTP when you:See complete examples: TTS over HTTP · STT over HTTP
- Generate audio files for download or storage
- Transcribe pre-recorded audio files
- Want the simplest possible integration
- Don’t need real-time streaming
WebSocket Best Practices
WebSocket connections are stateful — you need to handle the lifecycle:- Reconnect with backoff. If the connection drops, retry with exponential delay (1s, 2s, 4s… up to 30s).
- Handle binary and text frames. Audio arrives as binary
ArrayBuffer; control messages come as JSON text. - Send a close frame when you’re done so the server releases resources.
Which Protocol for Which Use Case?
| Use case | Protocol | Why |
|---|---|---|
| Convert a script to an audio file | HTTP | One request, one file — simple |
| Transcribe a batch of recordings | HTTP | Upload each file, get results |
| Voice agent (phone or web) | WebSocket | Real-time STT→LLM→TTS loop |
| Live captioning / transcription | WebSocket | Stream mic audio, get text back |
| Generate a voiceover for a video | HTTP | No real-time requirement |
Next Steps
TTS over HTTP
Generate audio files from text with simple HTTP requests.
TTS over WebSocket
Stream audio in real-time with interruption support.
STT over HTTP
Transcribe pre-recorded audio files.
STT over WebSocket
Transcribe live audio from a microphone.