A Faster and More Accurate Method for Implementing RAG
1. The Limitation of Generative AI and LLM Utilization
As you may know, over the past one to two years, generative AI models based on various LLMs, such as OpenAI's ChatGPT, Google's Gemini, and Meta's LLaMA, have become widely popular. Naturally, businesses are increasingly adopting generative AI to enhance productivity.
Generative AI operates on LLMs pre-trained with vast datasets, allowing them to provide accurate answers to ordinary questions. However, when it comes to domain-specific expertise or queries requiring additional datasets, they may generate irrelevant or incorrect responses without traceable sources.
This phenomenon, known as hallucination, occurs when a model outputs false information as if it were factual, representing a significant limitation of general-purpose generative AI models.
For businesses, relying on incorrect information for decision-making is unacceptable, making the issue of hallucination a significant barrier to adopting generative AI in corporate environments.
To address this issue, companies typically employ prompt engineering, where specific instructions are tailored for the LLM, or fine-tuning, where model parameters are adjusted. However, prompt engineering has limitations, such as constraints on input length and the degree of performance improvement achievable. On the other hand, fine-tuning is technically complex, requires high-performance computing resources, and incurs significant costs.
2. Increasing Demand for LLM+RAG Services Targeting Corporate Internal Data
Retrieval-Augmented Generation (RAG) is one of several methods designed to overcome the limitations of generative AI. Simply put, it combines the information retrieval component (referred to as the Retriever) with the answer generation component to improve the accuracy of the final output. RAG involves converting a company's data into vectors and storing them in a database. When a user submits a query, it is transformed into a search query, and similar data is retrieved from the database to assist in generating the answer.
The advantage of RAG is that it allows user data to be incorporated quickly and cost-effectively without requiring additional model training, simply by adding a retriever. Moreover, since the embedded data is indexed and managed, it is possible to identify which data was referenced in generating the answer and even determine whether hallucination has occurred.
By using RAG, the accuracy of answers improves significantly compared to using only LLMs. As a result, many companies are increasingly adopting RAG to build Q&A systems targeting their internal data.
3. RAG Implementation Process
The complete process of implementing RAG, from connecting data sources to receiving answers, is outlined in the diagram below.
Each step can utilize tools provided by LangChain or various external tools as needed.
Key decisions, such as selecting the embedding model for documents or choosing a retrieval technique for implementing the retriever, offer multiple options and play a crucial role in improving the accuracy of RAG, along with prompt engineering.
4. High-Quality RAG Use Case
Let us examine a recent case where SmartMind implemented RAG using ThanoSQL for a client.
The project focused on developing an RAG chatbot to process the client’s internal business documents. How can faster and more accurate responses be achieved compared to conventional RAG chatbots?
The general steps for implementing RAG mentioned earlier are well-documented online, with many excellent reference materials available. Therefore, we will not go into detailed explanations here.
In this post, we will discuss what specific steps were taken to enhance response quality when applying RAG in real-world corporate settings, focusing particularly on improving speed and accuracy during the 'retrieval' phase.
In most cases, when companies adopt RAG-based chatbots, the data targeted by RAG typically consists of the company's internal 'documents.'
However, not all documents are created equally and can vary in several aspects. Key differences include (1) the content of the document (e.g., related to specific tasks), (2) access permissions (e.g., confidential documents or those restricted to specific departments), and (3) whether the document can be disclosed (e.g., cases where copyright issues prevent revealing certain materials). To address these differences, such information must be appended to the documents as metadata.
For example, consider a document titled 'S/W Development Methodology.' This document is related to development (document topic), accessible only to members of the development team (access permissions), and classified as confidential (disclosure status). Metadata reflecting these attributes would be added to the document. Another example is a document titled 'Company Life Guide,' which pertains to general company information, is accessible to all employees, and is publicly shareable. Metadata would also be added to represent these characteristics.
While it is possible to assign unique metadata to every document, this approach can become burdensome to manage. Instead, documents with the same metadata are grouped together and labeled as a 'collection.' Metadata is then assigned at the collection level, meaning all documents within a specific collection share the same metadata.
Corporate chatbots can serve a variety of purposes. For instance, there may be chatbots designed for specific departments or roles (e.g., a chatbot for security monitoring staff in the IT security department) and chatbots focused on specific areas for all employees (e.g., a chatbot supporting company welfare-related tasks). In ThanoSQL, such diverse use cases are referred to as 'scenarios.' Since each scenario requires different RAG data, effectively mapping the document collections defined earlier to these scenarios can save time and resources by preventing the retriever from searching through unnecessary data. Moreover, using only the relevant data enhances the accuracy of the chatbot's responses.
For example, if a legal officer inquiry about the subcontracting terms of a specific project, the chatbot selects the relevant collections, such as the project execution collection and the laws/regulations collection and performs searches only within those collections. By avoiding unrelated collections, the chatbot not only minimizes the risk of generating false information but also reduces response time.
5. Conclusion
Depending on the types and characteristics of a company’s documents and the variety of scenarios to be applied, additional metadata beyond the three mentioned earlier may be required. While tagging metadata for collection configuration can be a labor-intensive process, certain aspects can be automated with AI. This step is essential to improve the speed and accuracy of RAG searches.
By implementing RAG through the mapping of collections and scenarios using ThanoSQL, companies can deliver high-quality services efficiently across various domains.