Message Flow
Every STT WebSocket session follows this pattern: For the full list of message types and parameters, see the WebSocket protocol reference.Quick Start
Connect, initialize a session, stream an audio file, and print the transcription. You need a WAV or raw PCM file to test with — any short speech recording works.Going further
The WebSocket STT API supports several options you can set in theinit config or take advantage of in the response:
- Interim vs final transcripts — Partial transcripts update in real-time as the user speaks. Final transcripts are confirmed segments that won’t change. Use partials for live captions and finals for processing.
- Language — Pass a
languagecode in the init config for better accuracy. Not all models auto-detect. - Endpointing — Controls how quickly the API finalizes a transcript after silence. Useful for voice agents where you want fast turn-taking.
- Close vs finalize — Send
{ "type": "close" }when you are done to end the session. Use{ "type": "finalize" }to flush results mid-session without disconnecting. - Keep-alive — For long-running sessions with periods of silence, send
{ "type": "keepalive" }periodically to prevent idle disconnection.
Next Steps
Live STT demo
Try real-time speech recognition in your browser, no setup needed
STT HTTP examples
Simpler integration for pre-recorded files
WebSocket protocol
Full message types, parameters, and error codes
STT API reference
Endpoint-specific parameters