Skip to main content
You can let visitors talk to an SLNG voice agent directly in the browser. You need a small backend to create the web session (keeps your API key off the client) and a frontend that connects to LiveKit for real-time audio.

Prerequisites

  • A configured Voice Agent with its agent ID
  • An SLNG API key (get one at app.slng.ai)
  • A backend you can deploy server-side code to (Node.js, Deno, Python, etc.)
  • A frontend project with React (the examples below use React, but the LiveKit client SDK works with any framework)

How it works

  1. The browser asks your backend to start a session.
  2. Your backend calls the SLNG web-sessions endpoint and forwards the LiveKit credentials back.
  3. The browser connects to the LiveKit room, publishes the mic, and plays the agent’s audio.

Step 1: Create a backend endpoint

Your backend proxies the SLNG API so the API key never reaches the browser.
Never call the SLNG API directly from client-side code. Your SLNG_API_KEY must stay server-side.
The only call you need is:
POST https://api.slng.ai/v1/agents/{agent_id}/web-sessions
import express from "express";

const app = express();
app.use(express.json());

const SLNG_API_KEY = process.env.SLNG_API_KEY;
const SLNG_AGENT_ID = process.env.SLNG_AGENT_ID;

app.post("/api/session", async (req, res) => {
  const response = await fetch(
    `https://api.slng.ai/v1/agents/${SLNG_AGENT_ID}/web-sessions`,
    {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${SLNG_API_KEY}`,
      },
      body: JSON.stringify({
        arguments: {},
        ...(req.body.participant_name
          ? { participant_name: req.body.participant_name }
          : {}),
      }),
    }
  );

  const data = await response.json();
  if (!response.ok) {
    return res.status(response.status).json({ error: "SLNG API error", details: data });
  }
  res.json(data);
});

app.listen(3001);
The response includes the fields you need for the frontend:
{
  "livekit_url": "wss://...",
  "livekit_token": "...",
  "call_id": "...",
  "max_session_seconds": 300
}

Step 2: Install the LiveKit client SDK

npm install livekit-client

Step 3: Connect to the LiveKit room

Call your backend to get a session, then connect to the room:
import { Room, RoomEvent, createLocalAudioTrack } from "livekit-client";

async function startSession() {
  // 1. Get session credentials from your backend
  const res = await fetch("/api/session", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ participant_name: "visitor" }),
  });
  const session = await res.json();

  // 2. Create and connect to the LiveKit room
  const room = new Room({ adaptiveStream: true, dynacast: true });
  await room.connect(session.livekit_url, session.livekit_token);

  // 3. Publish your microphone
  const micTrack = await createLocalAudioTrack();
  await room.localParticipant.publishTrack(micTrack);

  return { room, micTrack, session };
}
The browser will prompt the user for microphone access on createLocalAudioTrack(). If your page is not served over HTTPS, most browsers will block the request.

Step 4: Play the agent’s audio

Attach the agent’s remote audio track to the DOM so the browser plays it:
room.on(RoomEvent.TrackSubscribed, (track) => {
  if (track.kind !== "audio") return;
  const el = track.attach();
  el.autoplay = true;
  document.body.appendChild(el);
});

room.on(RoomEvent.TrackUnsubscribed, (track) => {
  track.detach().forEach((el) => el.remove());
});

Step 5: Show live transcripts

Transcript updates arrive over a LiveKit data channel on the slng.transcript.v1 topic:
const TRANSCRIPT_TOPIC = "slng.transcript.v1";

room.on(RoomEvent.DataReceived, (payload, _participant, _kind, topic) => {
  if (topic !== TRANSCRIPT_TOPIC) return;

  const msg = JSON.parse(new TextDecoder().decode(payload));

  if (msg.type === "conversation_item_added" && msg.item) {
    const { id, role, content, created_at } = msg.item;
    const text = Array.isArray(content) ? content.join("\n") : String(content);
    // Append { id, role, content: text, created_at } to your transcript state
  }
});
Each transcript item has:
FieldDescription
idUnique message ID (use to deduplicate)
role"user" or "assistant"
contentThe transcribed text (string or array of strings)
created_atTimestamp

Step 6: Add mute and disconnect controls

// Toggle mute
async function toggleMute(micTrack, muted) {
  if (muted) {
    await micTrack.unmute();
  } else {
    await micTrack.mute();
  }
  return !muted;
}

// End the session
async function disconnect(room, micTrack) {
  micTrack.stop();
  await room.disconnect();
}

Step 7: Detect who is speaking

The active-speakers event tells you when the agent is talking, so you can drive a visual indicator or avatar animation:
room.on(RoomEvent.ActiveSpeakersChanged, (speakers) => {
  const agentIsSpeaking = speakers.some(
    (p) => p.identity !== room.localParticipant.identity
  );
  // Update your UI based on agentIsSpeaking
});

Optional: Add a visual persona

A voice-only interface gives users no visual cue about what the agent is doing. Adding an animated persona — an orb, waveform, or avatar — makes the experience feel more responsive. Two ready-made libraries work well here:

Vercel AI SDK Persona

A React component with built-in states: idle, listening, speaking, thinking. Drop it in and map LiveKit events to states.

ElevenLabs Conversational UI

Orb and avatar components designed for voice interfaces, with audio-reactive animations.
To wire either library up, map your session and LiveKit events to persona states:
// Derive a persona state from your session + LiveKit events
function getPersonaState({ status, muted, agentIsSpeaking }) {
  if (status === "connecting") return "idle";
  if (status === "ended") return "idle";
  if (agentIsSpeaking) return "speaking";
  if (muted) return "thinking";
  return "listening";
}

// Update on active-speaker changes (Step 7)
room.on(RoomEvent.ActiveSpeakersChanged, (speakers) => {
  const agentIsSpeaking = speakers.some(
    (p) => p.identity !== room.localParticipant.identity
  );
  setPersonaState(getPersonaState({ status, muted, agentIsSpeaking }));
});

Putting it all together

A minimal React component with all the steps above wired together:
import { useEffect, useRef, useState, useCallback } from "react";
import {
  Room,
  RoomEvent,
  createLocalAudioTrack,
  type LocalAudioTrack,
  type RemoteTrack,
} from "livekit-client";

const TRANSCRIPT_TOPIC = "slng.transcript.v1";

interface SessionData {
  livekit_url: string;
  livekit_token: string;
  call_id: string;
  max_session_seconds: number;
}

interface TranscriptItem {
  id: string;
  role: "user" | "assistant";
  content: string;
}

export default function VoiceSession() {
  const [status, setStatus] = useState<"idle" | "connecting" | "active" | "ended">("idle");
  const [muted, setMuted] = useState(false);
  const [transcript, setTranscript] = useState<TranscriptItem[]>([]);
  const roomRef = useRef<Room | null>(null);
  const micRef = useRef<LocalAudioTrack | null>(null);

  const start = useCallback(async () => {
    setStatus("connecting");

    // Get session from your backend
    const res = await fetch("/api/session", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({}),
    });
    const session: SessionData = await res.json();

    const room = new Room({ adaptiveStream: true, dynacast: true });
    roomRef.current = room;

    // Play agent audio
    room.on(RoomEvent.TrackSubscribed, (track: RemoteTrack) => {
      if (track.kind !== "audio") return;
      const el = track.attach() as HTMLAudioElement;
      el.autoplay = true;
      document.body.appendChild(el);
    });

    room.on(RoomEvent.TrackUnsubscribed, (track: RemoteTrack) => {
      track.detach().forEach((el) => el.remove());
    });

    // Transcript
    room.on(
      RoomEvent.DataReceived,
      (payload: Uint8Array, _p: unknown, _k: unknown, topic?: string) => {
        if (topic !== TRANSCRIPT_TOPIC) return;
        try {
          const msg = JSON.parse(new TextDecoder().decode(payload));
          if (msg.type === "conversation_item_added" && msg.item) {
            const { id, role, content } = msg.item;
            const text = Array.isArray(content) ? content.join("\n") : String(content);
            setTranscript((prev) =>
              prev.find((t) => t.id === id) ? prev : [...prev, { id, role, content: text }]
            );
          }
        } catch {
          // ignore malformed messages
        }
      }
    );

    room.on(RoomEvent.Disconnected, () => setStatus("ended"));

    await room.connect(session.livekit_url, session.livekit_token);

    const micTrack = await createLocalAudioTrack();
    micRef.current = micTrack;
    await room.localParticipant.publishTrack(micTrack);

    setStatus("active");
  }, []);

  const toggleMute = async () => {
    if (!micRef.current) return;
    muted ? await micRef.current.unmute() : await micRef.current.mute();
    setMuted((m) => !m);
  };

  const disconnect = useCallback(async () => {
    micRef.current?.stop();
    micRef.current = null;
    await roomRef.current?.disconnect();
    roomRef.current = null;
    setStatus("ended");
  }, []);

  // Cleanup on unmount
  useEffect(() => {
    return () => {
      micRef.current?.stop();
      roomRef.current?.disconnect();
    };
  }, []);

  return (
    <div>
      {status === "idle" && <button onClick={start}>Start conversation</button>}
      {status === "connecting" && <p>Connecting…</p>}

      {status === "active" && (
        <div>
          <button onClick={toggleMute}>{muted ? "Unmute" : "Mute"}</button>
          <button onClick={disconnect}>End call</button>
        </div>
      )}

      {status === "ended" && <p>Session ended.</p>}

      <ul>
        {transcript.map((item) => (
          <li key={item.id}>
            <strong>{item.role}:</strong> {item.content}
          </li>
        ))}
      </ul>
    </div>
  );
}

Next steps