The LLM-tiering behavior on this page is being rolled out and verified.
Contact us to discuss availability for your account.
Adaptive execution paths
In a traditional voice pipeline, every turn takes the same path: audio in, STT, LLM, TTS, audio out. The same pipeline and the same cost, whether the caller asked a complex question or said “yes.” The execution layer constructs the path for each turn based on the full context of the conversation: what was said, what has been said before, how the call is progressing, and what the turn actually requires. The decision is not just which model to call, but whether to call a model at all, which stages to invoke, and how to assemble the response. Same API call from your side. The path is constructed for you.Paths adapt in real time
Within a single call, the path can change turn by turn. The first turn might need full reasoning to understand intent. The next few might resolve through shorter paths as the conversation enters a familiar pattern. Then a turn introduces new complexity and the path expands again. The decision factors are compound (language, intent, similarity to prior turns, availability of cached responses, model load, regional constraints) and they are evaluated together.How it improves over time
The layer learns from every call:- Path intelligence sharpens with volume. Early on, more turns take the full path as a safe default. Over time, a growing proportion route through faster, cheaper paths.
- Your call patterns shape your optimization. A healthcare scheduling agent develops different path intelligence than a financial-services collections agent.
- Improvement is continuous. You do not retrain anything. The layer observes outcomes and the next call benefits.
What this means
| Without tiered decisioning | With tiered decisioning |
|---|---|
| Every turn hits the LLM | Only turns that need reasoning hit the LLM |
| Latency is constant regardless of turn complexity | Simple turns resolve faster through a shorter path |
| Cost scales linearly with call volume | Cost per call decreases as the layer learns your patterns |
Configuration
LLM model selection and failover are configured on the voice agent:| Setting | Default | Description |
|---|---|---|
llm_first_token_timeout_s | 6.0 | Maximum wait for the first token from the LLM before failover |