Questions & Answers
What is the Architecture of a RAG System?
Alex Kagan, NLP Researcher and ML Engineer, GigaSpaces answered
Retrieval-Augmented Generation (RAG) systems are redefining large language models (LLMs). RAG architecture combines generative capabilities with external data retrieval mechanisms to produce more accurate, contextually relevant, and up-to-date responses than traditional LLMs. This integration is often referred to as the LLM RAG pattern, which highlights the relationship between retrieval and generation in modern AI applications.
A RAG system’s architecture comprises four interconnected and essential components:
- Input: A user’s request or prompt that provides the context and sets the boundaries for the system response. Typically, for a question, command, or other type of query, the input specifies the information the user is seeking.
- Retriever: The retriever is what sets RAG apart from traditional LLMs. It fetches or retrieves relevant data from external sources to enrich the system’s response. Possible sources include, but are not limited to, databases, knowledge repositories, and web pages.
- Generator: Once the retriever has fetched relevant data, the generator processes the retrieved information and uses it to craft a fluent, coherent response. It synthesizes the knowledge from the retrieved documents and the context of the input query to craft a response that is both accurate and human-like.
- Output: The final output is the system’s delivery to the user, shaped by the generator’s synthesis of both the query input and external data. It is also known as a response.
Understanding RAG Retrievers
As noted, the retriever is the key component of a RAG model architecture. There are three primary key types of retrievers – they are:
- Sparse Retrievers: These utilize traditional search techniques like TF-IDF or BM25 to match query terms with document terms based on frequency and weighting. They are ideal for precise keyword matching in large datasets and are commonly used in libraries, enterprise document systems, and keyword-driven search engines.
- Dense Retrievers: These encode queries and documents into dense vector representations using models like BERT, comparing them via similarity measures to capture semantic meaning. They are suited for nuanced queries in conversational AI, recommendation systems, and open-domain Q&A.
- Domain-Specific Retrievers: Fine-tuned for specialized fields like healthcare, law, or academia, these retrievers incorporate domain-specific terminology and context for accurate results. They excel in tasks requiring in-depth knowledge of a specific subject area.
Types of RAG System Architectures
The workflow in the first section of this page represents the architecture of a Simple RAG system. This type of system is best suited for use cases such as customer support bots, FAQ systems, or any use case that necessitates accurate responses from a limited scope of information, such as a product manual. However, there are many other types of RAG model architectures.
Other types of LLM RAG architecture include but are not limited to:
Simple RAG with Memory
This type of RAG system improves on simple RAG by incorporating memory storage. When a user inputs a query, the system retrieves information not only from external sources but from past interactions to ensure continuity. It is useful for applications that facilitate ongoing interactions and, as such, require the model to remember information like user preferences.
Branched RAG
Instead of querying all data sources, Branched RAG systems evaluate each query to determine the most relevant data source and retrieve information from it alone. This type of architecture is suited for more complex queries, such as for legal or multidisciplinary research, as it reduces the risk of collecting irrelevant data.
HyDe (Hypothetical Document Embedding):
HyDE RAG is unique in that it generates a hypothetical document representation (a representation of what an idea document might look like) based on the user’s query and uses this document to guide retrieval. It’s best used for vague or complex queries and creative content generation.
Adaptive RAG
As the name suggests, Adaptive RAG adjusts its retrieval strategy based on the complexity of the query, leveraging either simple or multi-source retrieval as required. It works best for enterprise search systems as it handles both simple and complex systems efficiently, reducing overhead.
Corrective RAG (CRAG)
Corrective RAG evaluates the quality and relevance of retrieved documents, filtering out irrelevant ones. If the initial retrieval is insufficient, additional searches are conducted to refine the data. As such, it’s ideal for high-stakes scenarios like legal, medical, or financial contexts where accuracy is critical.
In summary, Retrieval-Augmented Generation (RAG) systems revolutionize large language models by combining retrieval mechanisms with generative AI. From Simple RAG to advanced configurations like HyDe and Corrective RAG, these systems adapt to a wide range of applications, ensuring accurate, context-aware, and up-to-date responses. RAG’s versatility makes it a cornerstone for intelligent information retrieval and response generation across industries.

