Skip to main content
The LLM-tiering behavior on this page is being rolled out and verified. Contact us to discuss availability for your account.
Not every turn in a voice call needs full LLM inference. A consent disclosure repeated for the hundredth time. A hold message. A standard greeting. An acknowledgment like “got it, one moment.” These do not need a reasoning model. They need a deterministic, low-latency response. Tiered decisioning allocates intelligence where it is needed and skips it where it is not.

Adaptive execution paths

In a traditional voice pipeline, every turn takes the same path: audio in, STT, LLM, TTS, audio out. The same pipeline and the same cost, whether the caller asked a complex question or said “yes.” The execution layer constructs the path for each turn based on the full context of the conversation: what was said, what has been said before, how the call is progressing, and what the turn actually requires. The decision is not just which model to call, but whether to call a model at all, which stages to invoke, and how to assemble the response. Same API call from your side. The path is constructed for you.

Paths adapt in real time

Within a single call, the path can change turn by turn. The first turn might need full reasoning to understand intent. The next few might resolve through shorter paths as the conversation enters a familiar pattern. Then a turn introduces new complexity and the path expands again. The decision factors are compound (language, intent, similarity to prior turns, availability of cached responses, model load, regional constraints) and they are evaluated together.

How it improves over time

The layer learns from every call:
  • Path intelligence sharpens with volume. Early on, more turns take the full path as a safe default. Over time, a growing proportion route through faster, cheaper paths.
  • Your call patterns shape your optimization. A healthcare scheduling agent develops different path intelligence than a financial-services collections agent.
  • Improvement is continuous. You do not retrain anything. The layer observes outcomes and the next call benefits.

What this means

Without tiered decisioningWith tiered decisioning
Every turn hits the LLMOnly turns that need reasoning hit the LLM
Latency is constant regardless of turn complexitySimple turns resolve faster through a shorter path
Cost scales linearly with call volumeCost per call decreases as the layer learns your patterns

Configuration

LLM model selection and failover are configured on the voice agent:
{
  "models": {
    "llm": "groq/moonshotai/kimi-k2-instruct-0905",
    "fallbacks": {
      "llm": ["groq/moonshotai/kimi-k2-instruct-0905"]
    },
    "llm_first_token_timeout_s": 6.0
  }
}
SettingDefaultDescription
llm_first_token_timeout_s6.0Maximum wait for the first token from the LLM before failover
See Configuration & Tools for the full model configuration reference.