Skip to main content
SLNG runs three stages between your orchestrator and the models you use, one for each part of the voice pipeline. Each removes a category of unnecessary compute.

Request lifecycle of a single turn

When a caller speaks:
  1. Audio arrives from the caller via your orchestrator.
  2. STT Routing selects the transcription model based on language, accent, noise profile, and cost constraints.
  3. Transcript goes to your orchestrator’s LLM, or to SLNG’s tiered decisioning.
  4. Tiered Decisioning determines whether the turn needs full inference, local inference, or can be resolved without calling the LLM.
  5. The response (LLM, cached, or deterministic) is sent to TTS.
  6. Output Assembly checks cache, assembles from segments where possible, and generates only what is new.
  7. Audio returns to the caller.
These stages do not have to run in strict sequence. Assembly can begin before the LLM has finished its full response.

Three stages

STT Routing PRIVATE BETA

The first stage. When audio arrives, it is routed to the right STT model for that specific interaction. A Hindi caller in Mumbai gets a different model than an English caller in New York. A noisy environment may route to a model with better noise handling. The layer balances accuracy, latency, and cost per turn. See STT Routing for details.

Tiered Decisioning

The second stage. Not every turn needs full LLM inference. A consent disclosure repeated for the hundredth time does not need a reasoning model. It needs a deterministic, low-latency response. Intelligence is allocated where it is needed and skipped where it is not. See Tiered Decisioning for details.

Output Assembly

The third stage. TTS on the execution layer is assembly, not generation. When a turn produces text for speech, the layer checks whether this output, or parts of it, has been assembled before. A miss is synthesized, cached, and served from cache next time. See Output Assembly for details.

Regional execution

All three stages run across a global edge network. By default, requests route to the lowest-latency region. You can pin to specific regions for compliance or performance. See Regional Execution for details.

Continuous improvement

As more calls flow through the layer, routing decisions sharpen, cache coverage grows, and tier allocation becomes more accurate. Every call makes the next one more efficient.