Case Study: AI Chatbot using RAG 

AI Chatbot using RAG

When building basic Large Language Model (LLM) chatbots we identified two major limitations that prevent many clients from leveraging their benefits: they are bad at answering questions that depend on highly dynamic datasets, and they hallucinate (make up facts). We tackled these two challenges using Meta’s Retrieval Augmented Generation (RAG) design pattern. RAG is a game changer!

This case study describes a RAG-based AI chatbot solution that automated the answers to 82% of our client’s employee-related HR questions.

How does Retrieval Augmented Generation work?

RAG implementations supercharge LLM-based chatbots by supplementing relevant information alongside the user’s question. This generates a more accurate answer over standard LLM-based chatbots. RAG also informs the user about what sources were used to generate the answer, and so the user can perform quick fact-checks.

A RAG implementation follows these standard steps: 

  1. The user asks a question

  2. You fetch contextual data relevant to this question

  3. If relevant data is found, you package this data alongside the user’s question

  4. You send this package to the LLM, which returns a text-based answer

  5. You then return this answer to the user (alongside the supplementary relevant data used to generate the answer)

This design pattern has two main advantages over standard LLM-based chatbots: Since RAG supplements LLM questions with relevant data, the LLM’s answers are more grounded in reality. And, since RAG shows the user the context used to come up with its answer, the user can perform a quick fact check of the answer.

The context fetching process is RAGs “secret sauce”. Context fetching finds the relevant data that helps the LLM to generate a better answer, and this process is the main difference between RAG and a standard LLM-based chatbot. 

How did we implement context fetching?

There are a few ways to build the context fetching process. In this project, the client needed a chatbot that could answer their employee’s HR related questions. So, we built a system that performed two different types of context fetching. First, we did an API call to the company’s HRIS to collect the logged in employee’s account data (this returned information such as how many PTO hours they had accrued, along with salary and benefits data etc). Secondly, we semantically searched their internal HR policy documents to find any information relevant to the question being asked.

The flow diagram below shows these two context fetching processes. Note how they operate in parallel to find both employee-specific data and company-wide data.

The API call used to fetch employee data is straightforward to implement for anyone familiar with developing applications and so is not discussed in detail here. However, the semantic search process is a bit more specialized to working with LLMs, and so we dive deeper below.

What is Semantic Search?

Semantic search numerically encodes the intent behind the user’s question and leverages this encoding to find relevant data. Specifically, in this project we facilitated the semantic search using an out-of-the-box vector embedding algorithm.

How did we implement Semantic Search?

Before we could search for documents relevant to the question that a user asked, we had to first establish a pipeline to store the client’s HR policy documents in a searchable format. We built this document storing pipeline with the following steps (also visualized below in the gray box with steps labelled “a”):

  1. Extract all of the text from the documents (we used OCR to extract the text from PDFs where we didn’t have raw text, but the other documents were more straightforward)

  2. Split the extracted text into chunks

  3. Encode these chunks with a vector embedding algorithm

  4. Write this encoded chunks into a vector database

Once the data had been loaded into the vector database, we built a pipeline to retrieve relevant documents using semantic search. This pipeline is shown above above in the green box (with steps labelled “b”), and it includes the following steps:

  1. Extract the text from the question that the user has asked

  2. Run this extracted text through the same vector embedding algorithm that was used to encode the document chunks when you were loading the documents into the vector database. This step returns a vector.

  3. Search the vector database for the text chunks closest to this vector

  4. Filter out any returned document chunks that are not relevant to the question being asked

  5. Return any text chunks remaining after the filtration process

Outcome

Standard LLM-based chatbots are bad at answering questions that depend on highly dynamic datasets and they can hallucinate (make up facts), but RAG is a low lift design pattern that overcomes these hurdles. This particular solution automated the answers to 82% of our client’s employee-related HR questions.

Reach out for a free consultation if your organization needs any assistance in developing their own RAG based LLM chatbot.

Previous
Previous

Refactor to go faster

Next
Next

XAI’s crucial role in GxP Manufacturing