Placeholders
The snippets below use these placeholders. Replace them before running the code.| Placeholder | Replace with |
|---|---|
SLNG_API_KEY | An SLNG key from app.slng.ai/api-keys. The snippets read it from the SLNG_API_KEY environment variable. |
recording.wav | A local WAV or raw PCM audio file to transcribe |
Message Flow
Every STT WebSocket session follows this pattern: For the full list of message types and parameters, see the WebSocket protocol reference.Quick Start
Connect, initialize a session, stream an audio file, and print the transcription. You need a WAV or raw PCM file to test with. Any short speech recording works.Going further
The WebSocket protocol supports several options you can set in theinit config or take advantage of in the response:
- Interim vs final transcripts: Partial transcripts update in real-time as the user speaks. Final transcripts are confirmed segments that won’t change. Use partials for live captions and finals for processing.
- Language: Pass a
languagecode in the init config for better accuracy. Not all models auto-detect. - Endpointing: Controls how quickly the API finalizes a transcript after silence. Useful for voice agents where you want fast turn-taking.
- Close vs finalize: Send
{ "type": "close" }when you are done to end the session. Use{ "type": "finalize" }to flush results mid-session without disconnecting. - Keep-alive: For long-running sessions with periods of silence, send
{ "type": "keepalive" }periodically to prevent idle disconnection.
Next Steps
Live STT demo
Try real-time speech recognition in your browser, no setup needed
STT HTTP examples
Simpler integration for pre-recorded files
WebSocket protocol
Full message types, parameters, and error codes
STT API reference
Endpoint-specific parameters