Voice Architecture
Key components
-
Retell.ai: session management + real‑time ASR; provides interruption/breakpoint signals and streams text to Leena AI
-
Voice Orchestrator (perimeter fast path): GPT‑4.1 configured to
- Instantly acknowledge/clarify
- Provide micro‑responses
- Gate & forward complex turns to the core
-
Core Orchestrator (Leena Autonomous Agent): planning, tool selection, execution, knowledge grounding, and long‑running job coordination.
-
Text-to-Speech (TTS): Retell.ai primary; hot‑swappable secondaries for automatic failover; voice persona per customer.
Request Lifecycle
- Call start: Client app opens secure WebSocket (WSS) to Retell.ai; audio is streamed bi‑directionally.
- Real‑time ASR: Retell.ai emits partial/final transcripts to Leena over WSS (encrypted in transit).
- Perimeter fast‑ack: Voice Orchestrator consumes streaming text; issues instant natural acks (e.g. "Got it—checking your PTO") without blocking on the core.
- Routing:
- Simple/safe (greetings, confirmations) go to Voice Orchestrator answers directly
- Transactional/complex go to Core Orchestrator.
- Execution (Core): intent grounding -> plan -> tool/API calls (e.g. HRIS). Context kept minimal (see latency section) and cached when safe.
- Progress updates: Core streams machine‑readable status ("checking KB", "calling Time‑Off API"), which Voice Orchestrator rephrases as natural speech.
- Response delivery: Final text -> TTS (Retell.ai preferred) -> audio frames to Retell.ai -> back to user over the same WSS channel.
- Observability: Per call, Leena stores metadata, audio, transcripts (chronological), summaries, and sentiment for analytics & debugging (at customer’s hosting region).
- Barge‑in/interrupts:
- Retell.ai signals user interrupt events mid‑utterance.
- Voice Orchestrator arbitrates: continue / cancel / queue relative to the Core’s in‑flight job, using Core state to determine if rollback is possible.
- Auto Termination: Call gets automatically terminated in 2 cases:
- If the user is silent for more than 2 mins
- If the call goes beyond 40 mins (this is a configurable value)
Latency design (why it feels instant)
We achieve near‑instant acks (less than human pause), smooth turn‑taking, fast task completion via:
- Perimeter fast path using GPT‑4.1 to produce immediate, context‑aware acks and simple replies.
- Streaming ASR: text tokens arrive as the user speaks; no end‑of‑speech blocking.
- Progressive disclosure: user hears meaningful updates while the Core executes.
- Context diet: Core strictly limits prompt/context size per turn; uses selective retrieval and memoized tool results (caching) where permissible.
- Parallelism: plan & tool‑prep in parallel with TTS buffering of non‑critical preambles.
- Vendor locality roadmap: future regional ASR/TTS to reduce RTT where needed.
Dialogue quality: prosody, sentiment, and control
- Naturalness default: modern TTS voices are humanlike even without SSML
- Production control: for long or sensitive reads (spelling, reset steps), we apply SSML (pace, pauses, emphasis, repeat) on top of vendor defaults.
- Sentiment alignment: Voice Orchestrator infers user sentiment and mirrors tone (e.g. upbeat for holidays, calm for issues) by choosing phrasing and SSML cues.
Security, privacy, and data residency
- Transport security: All hops use WSS (TLS), bi‑directional.
- Client to Retell.ai (audio up)
- Retell.ai to Leena (transcripts)
- Leena to TTS to Retell.ai (audio down)
- Retention controls at the subprocessor: Retell.ai is configured not to persist call data beyond a minimal operational window; Leena stores authoritative logs.
- Hosting region: Leena persists audio/transcripts in the same region as the customer’s Leena tenant.
- Constraint: Retell.ai currently hosts in US regions only; this can block strict‑localization customers (e.g. some ME countries, EU‑only mandates). But, Retell.ai does not store any data like call logs etc.
- Roadmap: evaluate in‑house / open‑source voice stack with pluggable ASR/TTS to unlock additional hosting regions and full data‑plane control.
Updated about 14 hours ago
