Retrieval Augmented Generation

In previous articles, we talked about generative AI, its benefits, and the risks that it comes with. One such risks is the fact that generative AI can hallucinate. It also doesn’t have access to the information you keep professionally. Retrieval augmented generation (RAG) addresses both issues. In this article, we answer the following questions: What is retrieval augmented generation? What are the benefits? And how can you use retrieval augmented generation with Copilot & SharePoint?

What is retrieval augmented generation?

Wikipedia defines retrieval augmented generation (or RAG) as “a technique that enables large language models (LLMs) to retrieve and incorporate new information. With RAG, LLMs do not respond to user queries until they refer to a specified set of documents. These documents supplement information from the LLM’s pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this helps LLM-based chatbots access internal company data or generate responses based on authoritative sources. RAG improves large language models (LLMs) by incorporating information retrieval before generating responses.”

In other words, RAG enhances large language models by connecting them to external knowledge sources. Instead of relying solely on the information the model learned during training, RAG first retrieves relevant documents or data from a database, or your knowledge base. It then uses that retrieved information to generate more accurate and up-to-date responses.

The basic idea is simple: when you ask a question, the system searches through a collection of documents (like company files, research papers, or websites) to find relevant information. Then it feeds both your question and those retrieved documents to the language model. The model uses this context to produce an answer that’s grounded in your specific data rather than just its own general training knowledge. So, those are the three steps of retrieval augmented generation:

  • Retrieval: When a user asks a question, the RAG system searches an external knowledge base (like a company’s specific documents) for relevant information. 
  • Augmentation: The retrieved information is then added to the original prompt, creating an “augmented” request. 
  • Generation: The large language model (LLM) then generates a response based on this augmented prompt, using the external data to provide a more specific and accurate answer. 

This approach solves several common problems with standard LLMs. It reduces hallucinations because the model a) bases its answers on actual retrieved text, b) allows the system to access current information beyond the model’s training cutoff date, and c) lets you use domain-specific knowledge without having to retrain the entire model. RAG is particularly useful for applications like customer support systems that need company-specific information. It is also useful for research assistants that work with scientific literature, or in any scenario where you need accurate answers based on a particular knowledge base.

Now, when you start researching retrieval augmented generation, you will often encounter the terms pipes or pipelines. It refers to the processing steps that transform a user’s query into a final response. They’re essentially the workflow or data flow that connects different components of the RAG system. The “pipe” metaphor comes from Unix pipes, where data flows from one process to another.

Different RAG implementations can have varying pipeline architectures. Some are simple with just query, retrieve, and generate stages. Others are complex with multiple retrieval steps, feedback loops, or parallel processing paths.

What are the benefits?

RAG offers several benefits that make it attractive for real-world applications.

The fact that it offers access to current and specific information is perhaps the most obvious advantage. Since the model retrieves information from your own database or documents, it can work with data that’s a) more recent than its training cutoff or b) with highly specialized knowledge that wasn’t in its original training data. This means companies can get accurate answers about their latest policies, recent research papers, or proprietary information. Depending on how you set it up, for law firms it can have access to your legal documentation, your knowledge base, your case files and/or documents.

As mentioned in the introduction, reduced hallucinations are another major benefit. When language models generate answers purely from their training, they sometimes confidently state incorrect information. RAG grounds the model’s responses in actual retrieved documents. This makes it cite or base its answers on real sources rather than just making things up. The result is that its output is more reliable and trustworthy.

Another significant is cost-effectiveness. With RAG you don’t need to fine-tune or retrain large language models every time your information changes. Instead, you simply update your document database, and the RAG system will retrieve the new information. This is far cheaper and faster than retraining models. After all, that requires substantial computational resources and technical expertise.

RAG also addresses the issues of transparency and traceability because you can see which documents the system retrieved to answer a question. This makes it easier to verify answers, debug problems, and build trust with users who can check the sources themselves.

A final benefit is referred to as domain adaptability. It means that you can quickly deploy the same base model across different domains or use cases by simply swapping out the document collection it retrieves from. One model can serve medical applications, legal research, or customer support just by changing the underlying knowledge base.

Retrieval augmented generation with Copilot & SharePoint

Interesting for law firms who use Copilot and SharePoint is that Copilot can be used in combination with SharePoint to enable RAG responses. Microsoft has made this integration quite powerful.

How does it work? Microsoft 365 Copilot offers a retrieval API that allows developers to ground generative AI responses in organizational data stored in SharePoint, OneDrive, and Copilot connectors. This means you can build custom AI solutions that retrieve relevant text snippets from SharePoint without needing to replicate or re-index the data elsewhere. The API understands user context and intent, performs query transformations, and returns highly relevant results from your Microsoft 365 content.

This approach offers several advantages for RAG implementations. You don’t need to set up separate vector databases: You can skip the traditional RAG setup that involves embedding, chunking, and indexing documents. The API automatically respects existing access controls and governance policies. This ensures security and compliance. Additionally, you can combine SharePoint data with other Microsoft 365 and third-party sources to create richer, more comprehensive responses.

For personal experimentation

If you would like to first experiment on your own, you can try Google’s new Notebook LM, which implements RAG technology. It’s an AI-powered research and writing assistant that helps users summarize and understand information from uploaded sources or specific websites.

Sources: