Response time breakdown for text queries

How the System processes and answers user queries on text-based channels (For example, MS Teams, Slack, Desktop App, Web App, Mobile App)

Overview

This document covers how text-based user queries are handled by the System. Voice and other channels follow a different pipeline and will be covered separately.

It walks through, in simple terms, the journey of how a user request is handled end to end by the System — from the moment the message is received to the moment the reply is delivered.

Each user message passes through a short sequence of steps before the System responds. Each step adds a small amount of time, and the total time the user waits is the sum of all these steps. The sections that follow describe each step, the role of the underlying AI model, and how the journey looks for different types of queries.


The journey of a user message

When a user types a question, the System performs the following steps in order:

Step 1 — Safety and security checks (Guardrails)

Before anything else, we check the message for things like inappropriate language, attempts at misuse, or sensitive personal information. This protects both the user and the organisation. This step is optional and is enabled based on the System's configuration.

Typical time: 2 to 4 seconds (when enabled).

Step 2 — Language detection

We identify what language the user is writing in, so the System can reply in the same language.

Typical time: 1 to 2 seconds.

Step 3 — Understanding the request and choosing the right action (Re-ranking)

The Orchestrator has to decide what to do with the message. It might need to look up an answer in the knowledge base, raise a ticket, check a calendar, run a workflow, or simply reply directly.

To make this decision well, we share with the underlying AI model a list of all the possible actions available to the Orchestrator — including the names of those actions, what they do, and the inputs each one requires. The model then picks the most relevant one.

The more actions and inputs the Orchestrator has access to, the more information has to be shared and evaluated, and the longer this step takes.

Typical time: 2 to 5 seconds.

Step 4 — Fetching information or running the action

Once the action is chosen, the System carries it out. A few examples:

  • Looking up an answer in the knowledge base — typically 3 to 8 seconds, since this involves searching, ranking the most relevant documents, and assembling an answer.
  • Initiating a skill or tool call, or an AOP — for example, raising a time-off request, looking up a ticket, checking a calendar entry, or running a multi-step workflow. Time taken depends on the number of steps involved and the response time of the connected system (typically 0.5 to 3 seconds for a single tool/skill call, longer for multi-step workflows). The response time of the connected system itself can be a significant variable — fast APIs respond in under a second, while some native enterprise platforms (e.g. Workday) can take considerably longer depending on the volume or complexity of data being queried.

Step 5 — Synthesising the reply

Finally, the AI model synthesises the reply that the user sees. This is the single biggest contributor to total response time. The model writes the answer one small piece at a time, and longer replies naturally take longer to produce.

Typical time: 3 to 6 seconds for a normal reply, and longer for detailed answers.

Synthesising the reply for Knowledge Responses: When the answer needs to be assembled from multiple knowledge-base sources, or includes references from previous context in the conversation, the System synthesises the response to combine them into a single, coherent reply for the user. This step accounts for 5 to 10 seconds for Knowledge Management related queries.


The role of the underlying AI model

The AI model(s) that powers the System have a direct effect on response time. Two factors matter most:

Speed of the model

Different models write replies at different speeds. Broadly, there are two categories:

  • Lightweight, faster models — well-suited to routine tasks like FAQ answers, simple policy lookups, and short responses. They produce replies noticeably faster.
  • Higher-reasoning, more capable models — better at complex, multi-step queries where careful reasoning matters, but they generate replies more slowly.

Leena AI determines when to use which model (configured at a client's environment/instance level) based on the System's setup and configuration.

Reasoning / thinking time

For complex queries, AI models often "think" before they start writing — producing internal reasoning that the user does not see, but which still takes time. The deeper the reasoning, the higher the quality of hard queries, but also the longer the wait before the visible reply begins.

Length of the reply

Regardless of which model is used, longer replies always take longer to produce. Concise, direct answers are faster to generate than long, formal ones — so how the System is instructed to respond also affects perceived speed.


Sample response journeys by query type

Total response time depends heavily on what the System needs to do. The three journeys below show typical timing for the most common query types. The first three steps — safety checks, language detection, and choosing the action — are common to all of them; the differences come after.

Journey 1 — A knowledge-base only query

Example: "What is our work-from-home policy?"

The System looks up the answer in the knowledge base and responds. No external applications are involved at run-time.

StageApprox. timeWhat is happening
Safety checks~ 2–4 sec (optional)Screening the message
Language detection~ 1–2 secIdentifying the user's language
Choosing the action~ 2 secDeciding to use the knowledge base
Knowledge base tool/skill~ 4–8 secSearching, ranking documents, assembling an answer, finding sources
Synthesising the reply~ 5–10 secCombining content from multiple KB sources or previous context into a coherent reply
Total~ 14–26 sec

Journey 2 — A query that uses a tool/skill call

Example: "Raise an IT ticket for my laptop issue" or "What is the status of my last leave request?"

The System needs to call a connected system — an ITSM or HCM — to read or write data, and then respond based on the result.

StageApprox. timeWhat is happening
Safety checks~ 2–4 sec (optional)Screening the message
Language detection~ 1–2 secIdentifying the user's language
Choosing the action~ 2–5 secIdentifying the right tool/skill from all available actions
Collecting required inputsvariesAsking the user for any missing details (only if needed)
Calling the connected system~ 0.5 sec – several sec (depends on third-party system)Reaching out to the third-party tool/skill and getting a response. Varies significantly with the connected system — some native enterprise platforms (e.g. Workday) can take longer depending on data volume.
Synthesising the reply~ 1–4 secComposing the answer using the tool/skill's result
Total~ 7–18 sec (longer if the connected system is slow)End-to-end response time (excluding user input time)

Note on third-party dependencies: The totals shown above assume typical third-party API response times. When the connected system itself is slow — for example, certain native enterprise platforms such as Workday whose APIs can vary based on the volume or complexity of data being queried — total response time can be substantially higher. That portion is governed by the third-party system rather than the System itself.

Journey 3 — A query that triggers an AOP (multi-step workflow / business process)

Note: AOPs are not enabled or created for all clients — they are configured selectively based on the use cases identified during implementation.

Example: "Help me onboard a new employee"

An AOP is a multi-step automated business process where the Orchestrator carries out a series of tasks — gathering inputs, calling several tools/skills, sometimes handing off to a specialised sub-agent — before producing a final response.

Because AOPs perform several steps, they take noticeably longer than a single-shot reply. The table below shows the typical shape of an AOP execution (assuming AOP has a single step):

StageApprox. timeWhat is happening
Safety checks~ 2–4 sec (optional)Screening the message
Language detection~ 1–2 secIdentifying the user's language
Choosing the action~ 2–5 secIdentifying which AOP to run
Hand-off to the AOP / sub-agent~ 1–3 secLoading the AOP's instructions and tools, and starting execution
Collecting required inputsvariesPresenting form fields or questions to the user (if needed)
Executing each AOP step~ 10–15 sec per stepCalling tools, looking up data, processing each step in turn
Status updates to the user~ 1–4 sec eachSharing progress messages so the user is not staring at a spinner
Final response~ 3–8 secProducing a summary or confirmation of what was done
Total~ 20–41 secEnd-to-end, depending on number of steps (assumption: single step AOP)

Note on hand-offs: Each time control is passed from the main agent to a specialised sub-agent (and back), there is a short hand-off cost — typically 3 to 5 seconds — to load the receiving agent's instructions and tools/skills. AOPs that involve multiple specialised sub-agents will accumulate more hand-off time than single-agent workflows.


Why response times vary

Even within the same System, two queries can take different amounts of time. Some common reasons:

  • Length of the answer — longer, more detailed responses naturally take longer to produce.
  • Complexity of the request — queries that need multiple steps, lookups, or workflows take longer than a direct reply.
  • Number of available actions — when the System has many possible skills and AOPs to choose from, the Orchestrator takes a little longer to decide what to do.
  • Knowledge base size and structure — very large or very long documents take longer to search and rank.
  • Workflows with multiple steps — if the System is collecting inputs, calling several tools/skills, or coordinating with other agents, each step adds time.
  • External applications — when the System calls a third-party application (For example, ITSM or HCM), the response time of that application is included in the total. This can vary significantly — fast APIs respond in under a second, while native enterprise platforms such as Workday can take longer depending on the volume of data being queried.

In summary

Every reply from the System is the result of several short steps — safety checks, language detection, deciding on an action, fetching information, and synthesising the response. Most of these steps take only a couple of seconds, but they add up.

We are continuously investing in reducing the time taken at each of these stages.