Lifecycle of knowledge articles in Leena KM

Overview

This document provides an overview of the knowledge articles that can be managed within Leena AI's knowledge management system. It outlines the complete lifecycle of a knowledge article, from ingestion into the system to its final indexing. Additionally, it delves into the processes involved in handling these articles and addresses queries related to their processing.

Knowledge Articles in Leena's Knowledge Management System (KMS)

Knowledge articles are the foundational information source for enterprise virtual assistants and play a vital role as training data for machine learning models. 

Knowledge articles input methods

Knowledge Management Dashboard supports 3 ways using which users can import their data -

  1. Manual data upload - This includes uploaded pdfs, ppts, docx, excel/csv, images, written articles (articles which are created from scratch in KM dashboard) and web articles/html.
  2. Automatic data sync via Connectors - This includes fetching knowledge articles from 3rd party knowledge base like Sharepoint REST,  Sharepoint Graph,  Confluence,  GoogleDriveS erviceNow,  Box and any custom connector. Leena AI also supports an API-based connector, allowing users to push their data into Leena KMS and maintain control on their end.
  3. Web Integration - Leena AI enables seamless crawling or scraping of data from public websites, allowing users to integrate relevant information directly into their knowledge base. This feature supports a variety of use cases, including Support Website Integration such as https://support.atlassian.com/ , Company Website Integration such as https://www.leena.ai etc. Refer to this for more information.
    Note - This functionality is subjected to the website structure. Few websites might need custom configurations for data scraping. Kindly contact Leena admin for them.

Permissions of 'Knowledge Articles'

Users can set permissions or restrictions on ingested knowledge articles, which we at Leena AI refer to as 'audiences.' For manually uploaded articles, these restrictions can be created and managed using the KMS dashboard. For instance, if an article is intended only for managers, users can define an audience based on employee parameters, such as setting user.IsManager to 'true.' Once this manager-specific audience is linked to the article, it will ensure the article is accessible only to managers. Non-managers will neither see the article nor receive responses from it if queried.

For connector articles, these audience permissions are automatically fetched from the connected knowledge base or source. Leena AI replicates and enforces these permissions on its platform, ensuring they align with the source article's restrictions.

Custom Filters on Article Permissions
Leena AI lets you add extra filters on top of article permissions for connector articles. For instance, while all SharePoint articles may be accessible to everyone, some documents might be categorised by country in the sharepoint (e.g., APAC, France). Although these documents show a general "all users" permission in the KM dashboard, applying custom filters in the AI pipeline, based on user parameters (like user.country = "France") ensures that a French employee only gets bot response from documents marked as "France" or global. The French employee will be however able to view all documents in the bot because permission is set to "all users" but the query response will be customised for him based on the set filter which is "user.country" in this instance.  

Sync Frequency

The virtual assistant is synced when there is any change in the knowledge articles. For manually uploaded articles, users can edit them in the KM dashboard and publish the latest knowledge. The virtual assistant sync will happen which users can monitor in the dashboard.

For articles and their permissions fetched via connectors can be automatically synced at a predefined frequency. Users can choose to sync daily, weekly, or at a custom interval (minimum 24 hours), such as twice a week at a specified time. This ensures the virtual assistant stays up to date with the latest knowledge. The connector articles can be synced manually as well if required.

For web integrations as well, users can sync them automatically on weekly or monthly basis i.e every 7 days or every 30/90 days. The web integrations can be synced manually as well if required.

Parsing of 'Knowledge Articles'

Parsing involves extracting and converting content from PDF documents into an HTML format, enabling better accessibility and utilization of the information. This process ensures that data from various sources is effectively structured and ready for further analysis or integration into different platforms.

Parsing Process

The parsing process involves the below steps -

  1. Identifying Bounding Boxes: The first step involves detecting the bounding boxes in the content. During this phase, labels such as Figures, Headings, List Items, Paragraphs, and Tables are assigned to different sections of the document.
  2. Extracting Text and Images: Based on the predicted content hierarchy, text and image content are extracted. Heading tags are created for the identified headings, while other content is organized under the appropriate heading tags.
  3. Converting to HTML: Finally, the extracted text and images are converted into HTML format, ensuring the content is structured and formatted for further use.

The mAP (mean Average Precision) of the Leena AI parser is 95+, surpassing the performance of commonly available open-source alternatives, which typically hover around 90, especially in terms of layout detection. However the mAP can get affected from the complexity of ingested documents. The average time taken for the parser to parse 1 pdf page is around 8-10 sec.

The parsing process for different article types is as follows -

Note - Written articles and web articles or html articles are not parsed as their content is already in html format.

Users can access and review the parsed content within Leena KMS, with the option to edit it if needed. Since the virtual assistant relies on this parsed content to generate responses, ensuring its accuracy is crucial for delivering precise and relevant information.

Indexing of 'Parsed content' of the knowledge articles

Once the knowledge articles are parsed into HTML format, the content undergoes indexing in Elasticsearch. This indexing process includes a crucial step called vectorization, where the article's sections are transformed into numerical representations known as 'vectors'.

Leena AI utilizes 'MiniLM embeddings', a machine learning model that generates these vector embeddings based on the semantic meaning of the text. This means that sentences with similar meanings will have vector representations that are close to each other in the multidimensional space.

By storing and retrieving content using these vectorized representations, Leena AI ensures semantic search capability, allowing the virtual assistant to retrieve the most contextually relevant responses.

The knowledge article's indexing is updated as and when any updates are made to the content of the article. This is an automatic pipeline and does not require any user intervention.

Article expiry in KM

For manually uploaded articles, users can set an expiry date within Leena KMS. Once an article reaches its expiry date, it is automatically archived but can be restored if needed.

For connector-based articles, users cannot set an expiry date within Leena KMS since the source system (lets say, SharePoint) controls the document lifecycle. If an article is archived in Leena KMS but still exists in the connector, it will be restored during the next sync. However, if expiry dates are set at the connector's source (SharePoint) as some metadata, Leena AI can fetch this information and sync only active articles, ensuring alignment with the source system's data retention policies.

Multi-lingual articles

Leena AI supports over 180 languages for virtual assistant knowledge base. Users can define the supported languages in the KMS dashboard and upload or fetch articles accordingly. The virtual assistant can process queries in any of the selected languages and provide responses in the same language, ensuring a seamless multilingual experience.

Note - These supported languages are only for the user query and bot response. For translating the overall bot experience, please contact Leena admin.