Phone Call

In addition to interacting with the Leena AI agent via chat or via the in-app voice call experience, users can also call a real phone number and speak to the agent over a regular phone line. The phone call experience supports the same set of capabilities as chat — knowledge queries, ticket actions, workflows, and AI Colleagues — but is reachable from any phone, anywhere, without needing the Leena web app open.

This is the natural extension of the existing voice call capability. The key difference is the entrypoint: instead of clicking the voice call icon inside the Leena web app, the user simply dials a phone number provisioned for their organization.

How to Start a Phone Call

Getting started is as simple as making any phone call:

  • Dial the number: Each customer is provisioned with a dedicated phone number through Leena's telephony provider. Users dial this number from any phone (mobile or landline).
  • Authenticate: Because the call is coming in over a public phone line and not from a logged-in session, the user is asked to verify their identity before the agent can act on their behalf. See Phone Call Authentication below.
  • Start talking: Once authenticated, the user can speak to the agent the same way they would over the web voice call — ask questions, raise tickets, initiate workflows, and so on.

Phone Call Authentication

Unlike the in-app voice call, where the user is already logged in and the agent inherits that identity, a phone call lands on the system without any prior session context. To prevent unauthorized access to employee data, every phone call goes through an authentication step at the start of the call.

Leena AI supports two authentication methods for phone calls — OTP and Knowledge-Based Authentication (KBA) — with OTP as the default. Clients can choose only one method per instance.

OTP (default)

OTP is the simpler and recommended method when the caller is dialling in from their own registered phone number. The flow is short and feels familiar to anyone who has used OTP login before:

  • When the call connects, Leena uses the caller's phone number (the from-number on the inbound call) to look up the matching employee record. The lookup checks the phone fields on the employee profile.
  • If a unique employee is found, Leena sends a 6-digit OTP to that same phone number over SMS.
  • The agent asks the caller to enter the OTP on the keypad. The caller types it in via DTMF.
  • On a successful match, the caller is authenticated and the conversation begins.

The caller gets up to 3 attempts to enter the OTP correctly. If they exceed the rate limit on OTP requests (too many OTPs requested in a short window), the call ends with a "too many attempts" message and the caller is asked to try again later.

The practical constraint of OTP is that the caller must be dialling in from a phone number that is already on their employee record. If the caller is using a different phone — a desk phone, a colleague's phone, a hotel phone — OTP cannot identify them, and KBA is the right method instead.

KBA (alternative)

KBA is the alternative when OTP isn't a fit — typically because the organization cannot guarantee that callers will always dial in from their registered number, or because the customer prefers not to depend on SMS delivery for authentication itself, or simply because the employee database doesn't have a registered number against each employee.

With KBA, the agent asks the caller a short, fixed sequence of identity-verification questions at the start of the call, and the answers are matched against the employee record on file:

  • The agent prompts the caller for the first mandatory parameter (for example, "Please enter the last 4 digits of your SSN, followed by the hash key").
  • The caller responds via the keypad (DTMF). The system supports variable-length input — the caller presses # to indicate they are done.
  • The agent then prompts for the second mandatory parameter (for example, "Please enter the last 4 digits of your Employee ID, followed by hash").
  • Once both inputs are collected, Leena performs a single lookup against the employee directory using the combined values.
  • If exactly one employee matches, the caller is authenticated and the call continues.
  • If more than one employee matches (a collision), the agent asks for a tie-breaker parameter to disambiguate.
  • If no match is found after the configured number of retries (default: 3 global attempts), the agent informs the caller that authentication has failed and ends the call.

The caller also gets per-step retries for malformed input (for example, entering too few digits), separate from the overall lookup attempts.

What admins configure

Phone call authentication is configured per bot from the dashboard. The relevant settings are:

  • Phone number: the E.164-formatted number assigned to the bot (e.g. +14155551234).
  • Phone auth method: choose between OTP or KBA. OTP is the default.
  • KBA configuration (only when method is KBA):
    • Mandatory parameters: exactly two profile fields the caller must provide. For each parameter, the admin specifies the field name (e.g. SSN, employeeId, dateOfBirth), the prompt text the agent will speak, and how many digits to extract — first or last, between 4 and 6 digits.
    • Tie-breaker parameter: exactly one additional profile field used only when two or more employees match the mandatory parameters.
  • Spam detection: an optional rate-limit on inbound calls per number, configured by max call count and time window in minutes.

When KBA configuration is created or changed, Leena automatically re-runs the user sync in the background so that the underlying lookup data reflects the latest employee records.

During the Call

Once authenticated, the call experience is the same conversational AI experience as the in-app voice call: natural turn-taking, real-time transcription, instant acknowledgements, progress updates while the backend executes, barge-in support, and graceful handling of long-running tasks.

The one practical difference relative to web voice call is that the agent cannot show anything on a screen during a phone call. The agent therefore handles non-speakable content — links, forms, attachments, source citations — differently. See the next section.

Transcript and Links: Where the Information Goes

A phone call does not have a screen. So when the agent has something to share that does not work as spoken audio (a portal link, a document URL, a knowledge article reference, a confirmation page), Leena routes it to two places: the user's web chat history and an SMS to the user's phone.

Transcript on the web app

The full transcript of the phone call — both the caller's utterances and the agent's responses — is persisted to the user's chat history on the Leena web app. After the call ends, the user can sign in to the web app and see the entire conversation, in chronological order, alongside their regular chat history with the agent. Any structured outputs the agent surfaced during the call (links, sources, attachments) are rendered in the web chat exactly as they would be in a normal text conversation. This is the durable record of the call.

The transcript is built up live during the call from the streaming ASR, finalised once the call ends, and stored against the employee identity that authenticated at the start of the call. Because the same identity backs the web app session, the user does not need any extra step to see it — it just shows up in their chat history.

Links and key actions over SMS

Some content is time-sensitive — the user needs the link now, while still on the call, not after they hang up and log in. For this, the agent verbally tells the caller "I've sent that link to you on text message," and Leena pushes an SMS to the caller's number containing the relevant URL.

Because URLs cannot be spoken cleanly over voice, the orchestrator is channel-aware: on the in-app voice call it points users to the web chat; on a phone call it promises an SMS. The SMS is sent in real time during the call so the user can tap the link without waiting for the conversation to end.

Prerequisite: SMS Must Be Enabled as a Channel

For phone call to deliver the full experience, SMS must be enabled and configured as a channel on the bot. Specifically, the bot must have the Twilio SMS channel configured (a Twilio number for outbound SMS), and outbound SMS templates must be approved for the relevant regions.

If SMS is not configured, the phone call itself will still work — the user can still authenticate, ask questions, and complete workflows that don't require a link. But any in-call action that produces a URL or non-speakable content will not be delivered to the caller, because there is no channel to deliver it on. This is not a graceful experience and is not recommended for production rollouts. SMS should be treated as a hard prerequisite for enabling phone call on a tenant.

The dependency exists because phone-call link delivery is implemented as an SMS message addressed to the caller's number, sent through the same SMS plumbing used elsewhere in the platform (OTP delivery, notifications, magic links). Reusing this path means link delivery inherits the same provider routing, template compliance, and delivery logging as every other outbound SMS.

Behind the Scenes

The phone call architecture reuses the same orchestration stack as the in-app voice call. The Voice Orchestrator (perimeter fast path), Core Orchestrator (Leena Autonomous Agent), real-time ASR, and TTS layers are all identical. The only meaningful difference is the call origination and the channel-aware link handling described above:

  • In-app voice call: audio is captured in the browser and streamed to the telephony provider over a secure WebSocket. The agent's responses, including links, render directly in the web chat.
  • Phone call: audio originates from the public telephone network and is routed to Leena via the telephony provider's inbound number. The agent's responses are spoken back, with non-speakable content routed to SMS and the full transcript persisted to the web chat.

For a full description of the request lifecycle, latency design, security and data residency, and the role of each component (Retell.ai, Voice Orchestrator, Core Orchestrator, TTS), refer to the Voice Call page. The same architecture, observability, retention, and hosting guarantees apply to phone calls.

Limits and Defaults

  • Auto-termination: the call ends automatically if the user is silent for more than 2 minutes, or after a configurable maximum call duration (default 40 minutes — same as voice call).
  • Authentication retries: 3 global lookup attempts, with separate per-step retries for malformed DTMF input.
  • DTMF window: callers have a 15-second window per prompt to enter a value, with # as the termination key for variable-length input.
  • Languages: phone call currently supports English. Additional languages will follow the same rollout sequence as voice call.