Better RAG with LOTR-Lord of Retriever

Better RAG with LOTR-Lord of Retriever

7 min read

Implementing LOTR merger retriever & solving the ‘Lost in the Middle’ Challenge in RAG Systems

Understanding RAG and Its Components

RAG combines two pivotal elements: retrieval and generation. It starts with using advanced techniques like semantic search to navigate through large volumes of data, including text, images, audio, and video. The essence of RAG lies in its ability to retrieve pertinent information, which then serves as a foundation for the next phase. The generation component, leveraging the prowess of large language models, interprets these data chunks, crafting coherent, human-like responses. This process ensures that RAG systems can deliver more nuanced and accurate outputs compared to traditional generative models.

The ‘Lost in the Middle’ Phenomenon

The ‘LIM’ problem is quite challenging in the world of RAG and LLMs. Studies from universities like Stanford and UC Berkeley have highlighted this issue, which is similar to how people often remember the first and last items on a grocery list but forget those in the middle. Language models, like these people, are great at recognizing information at the beginning or end of the text they’re analyzing, but they tend to overlook key details in the center. This issue becomes more noticeable when these models have to process and understand information from a wide range of sources. It’s like trying to remember a specific detail from a movie when you’ve watched several in a row — the middle parts can become jumbled or forgotten.

Overcoming this Challenge with Advanced Techniques

  1. Avoid Single Knowledge Bases: Using just one knowledge base for different types of documents can confuse retrieval models. They might struggle to find the right information based on the topic or context.
  2. Use Multiple Vector Stores: Create separate data storage areas (called VectorStores) for different types of documents. This helps in organizing information more effectively.
  3. Merge Information with Merge Retriever: Combine the data from these various VectorStores using a tool called Merge Retriever. This helps in bringing together relevant information from different sources.
  4. Reorder with Long Context Reorder (LOTR): Use the LOTR technique to rearrange the order of information. This ensures that the model pays equal attention to data in the middle of the text, not just at the beginning or end.
  5. Balance Data Assessment: By using these techniques, especially LOTR, you can ensure that all parts of your data, including those in the middle sections, are properly reviewed and used in generating responses.

These steps help in improving the performance of RAG systems, making them more efficient in handling and interpreting vast and varied information sources.

LOTR (Merger Retriever)

LOTR: Lord of the Retrievers, also known as MergerRetriever, takes a list of retrievers as input and merges the results of their get_relevant_documents() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.

The MergerRetriever class can be used to improve the accuracy of document retrieval in several ways. First, it can combine the results of multiple retrievers, which can help to reduce the risk of bias in the results. Second, it can rank the results of the different retrievers, which can help to ensure that the most relevant documents are returned first.

Now, we are going to build the RAG for medical /healthcare issues. Our chatbot will be able to answer the questions based on healthcare. Now let's start some coding

Code Implementation

Install required packages

!pip -q install langchain lancedb  pypdf sentence-transformers openai tiktoken

Import required packages

from langchain.embeddings import HuggingFaceEmbeddings, OpenAIEmbeddings,HuggingFaceBgeEmbeddings
from langchain.document_transformers import (
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.retrievers.merger_retriever import MergerRetriever
from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import LanceDB
import lancedb

Setup Openai api key

import os
os.environ["OPENAI_API_KEY"] = "sk-yourkeyplease"

Setup the Embedding Models

we are using 3 different embedding models.

1.Huggingface BGE embedding -It's top on the MTEB leaderboard.

2.NeuML/pubmedbert-base-embeddings — this model is specially focused on medical related data.

3. Openai embedding mode -we are using the openai embedding model for removing the same embeddings.will explore this in a later code blog

#embedding models

medical_health_embedding = SentenceTransformerEmbeddings(

hf_bge_embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-large-en",
                                             encode_kwargs = {'normalize_embeddings': False})
filter_embeddings = OpenAIEmbeddings()

Load the document file

download the pdf from here

loader = PyPDFLoader("/content/AyurvedicHomeRemedies.pdf")
# pages = loader.load_and_split()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
text = text_splitter.split_documents(loader)

Instantiate two different Lnacdb Indexes with diff embeddings

# embedding 1 model - NeuML/pubmedbert-base-embeddings
db = lancedb.connect('/tmp/lancedb')
table = db.create_table("health embedding", data=[
    {"vector": medical_health_embedding.embed_query("Hello World"), "text": "Hello World", "id": "1"}
], mode="overwrite")

# Initialize LanceDB retriever
db_all = LanceDB.from_documents(text, medical_health_embedding, connection=table)

## embeding 2 model - 
db_multi = lancedb.connect('/tmp/lancedb')
table = db_multi.create_table("bge embedding", data=[
    {"vector": hf_bge_embeddings.embed_query("Hello World"), "text": "Hello World", "id": "1"}
], mode="overwrite")
# Initialize LanceDB retriever
db_multiqa = LanceDB.from_documents(text, hf_bge_embeddings, connection=table)

Merge all the retrievers

This will hold the outputs from both the retrievers and can be used as any other retriever on different types of chains.

retriever_med = db_all.as_retriever(search_type="similarity",
                                  search_kwargs={"k": 5, "include_metadata": True}
retriever_bge = db_multiqa.as_retriever(search_type="similarity",
                                        search_kwargs={"k": 5, "include_metadata": True})

LOTR :MergerRetriever now merge two retriver

The MergerRetriever, often referred to as LOTR, functions by combining the findings from various retrieval sources in a sequential, round-robin manner. It starts by collecting relevant documents identified by each retriever and then amalgamates these into a single, cohesive list. This list is effectively organized, showcasing documents that are pertinent to the specific query and ranked according to their relevance as determined by the different retrievers.

To enhance the efficiency of this merged list and avoid repetition, the EmbeddingsRedundantFilter can be employed with an additional embedding model. This helps in filtering out any overlapping or duplicate results from the combined retrievers. Additionally, the documents can be grouped into thematic clusters or ‘centers’ of related content. From these clusters, the document that most closely aligns with the central theme of each cluster is selected for the final compilation. This clustering and selection process is facilitated by the EmbeddingsClusteringFilter, ensuring a more organized and focused set of results.

lotr = MergerRetriever(retrievers=[retriever_med, retriever_bge])

for chunks in lotr.get_relevant_documents("What is use of tulsi ?"):

First, Remove redundant results from the merged retrievers.

# Remove redundant results from the merged retrievers EmbeddingsRedundantFilter drops redundant documents by comparing their embeddings
filter = EmbeddingsRedundantFilter(embeddings=filter_embeddings)

Re-order results to avoid performance degradation.

No matter the architecture of your model, there is a substantial performance degradation when you include 10+ retrieved documents. In brief: When models must access relevant information in the middle of long contexts, they tend to ignore the provided documents. See

from re import search
from langchain.document_transformers import LongContextReorder

reordering = LongContextReorder()  

pipeline = DocumentCompressorPipeline(transformers=[filter, reordering])
compression_retriever_reordered = ContextualCompressionRetriever(
    base_compressor=pipeline, base_retriever=lotr,search_kwargs={"k": 5, "include_metadata": True}

docs = compression_retriever_reordered.get_relevant_documents("What is use of tulsi ?")


Load the LLM model

To use custom LLM check our blog

from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(openai_api_key="sk-openaiapikey")
#check our blog for using different llms

qa = RetrievalQA.from_chain_type(
      retriever = compression_retriever_reordered,
      return_source_documents = True
query ="What is use of tulsi?"
results = qa(query)


results from LLM:

## results
For high fever and cough, you can try the following home remedies:

1. Take 1-2 grams of Pippali (Piper longum) powder with honey twice daily.
2. Drink a warm decoction prepared from 20 ml of water and 1 gram of Laung (clove) 3-4 times daily. This can help with both dry and productive cough.
3. Take 2 grams of Elaichi (cardamom) powder with honey 2-3 times a day.
4. Drink plenty of warm fluids like herbal teas, soups, and warm water to stay hydrated and soothe the throat.
5. Gargle with warm salt water to alleviate throat discomfort.
6. Rest and get plenty of sleep to support your immune system.

Remember, these remedies are for mild conditions. If your symptoms persist or worsen, it is important to consult a doctor for proper diagnosis and treatment.

Thats it. Here is the full code & Colab

this is how you can use lotr: lord of retriever & domain-specific embedding models & make a better-performing RAG system


To tackle the LIM problem and boost retrieval performance, it’s important to enhance RAG systems. By setting up different VectorStores and combining them with Merge Retriever, along with re-arranging results using LongContextReorder, we can lessen LIM issues and make the retrieval process more efficient. Additionally, incorporating domain-specific embeddings within the Merger Retriever plays a key role. These steps are crucial to ensure that we don’t miss out on important details in the middle of the documents we retrieve.

Stay tuned for upcoming blogs where we’ll take a deeper dive into the captivating realm of Large Language Models (LLMs). If you’ve found our exploration enlightening, we’d greatly appreciate your support. Be sure to leave a like!

But that’s not all. For even more exciting applications of vector databases and Large Language Models (LLMs), be sure to explore the LanceDB repository. LanceDB offers a powerful and versatile vector database that can revolutionize the way you work with data.

Explore the full potential of this cutting-edge technology by visiting the vector-recipes repository. It’s filled with real-world examples, use cases, and recipes to inspire your next project. We hope you found this journey both informative and inspiring. Cheers!