Geography
Route to the nearest region. Data stays in-jurisdiction. A caller in Frankfurt hits EU infrastructure. A caller in Mumbai hits AP South. No configuration needed: this is the default behavior. Override withX-Region-Override when you need a specific datacenter, or X-World-Part-Override when you need to stay within a geographic zone. See Regional Execution.
Compliance
Execution constraints determine where data and models can operate. Healthcare, banking, and insurance workloads have requirements about where audio can be processed and where transcripts can exist. Regional execution respects these constraints per request.Cost and latency
Cached path, local inference, or full reasoning, based on the input. The layer makes this decision per turn:- A greeting that has been said a thousand times: cached path.
- A simple acknowledgment: local inference, no LLM call.
- A complex question requiring reasoning: full inference path.