Skip to main content
STT Routing is in PRIVATE BETA. The behavior described here is being rolled out gradually. Contact us for access.
The first stage of the execution layer. When audio arrives, it is routed to the right STT model for that specific interaction. A Hindi caller in Mumbai gets a different model than an English caller in New York. A noisy environment may route to a model with better noise handling. The layer balances accuracy, latency, and cost per turn.

How it routes

Routing weighs several inputs together, not in isolation:
  • Language and accent of the audio
  • Noise profile of the environment
  • Regional availability of models
  • Cost and latency constraints
For voice agent calls, routing happens per turn. Each turn can route to the model best suited to that specific audio segment.

Today

Until STT Routing is generally available, you select the STT model explicitly on each request. See the Speech-to-Text Overview for the current model list and how to choose.

Speech-to-Text Overview

The models available today and how to pick one.

How It Works

Where STT routing sits in the pipeline.