Naive RAG Pattern

Overview

The Naive RAG pattern establishes the fundamental architecture for Retrieval-Augmented Generation systems. This pattern serves as the baseline for understanding more complex RAG implementations.

Architectural Diagram

                +-------------------+       +-------------------+       +-------------------+
                |                   |       |                   |       |                   |
                |    User Query     | ----> |    Retriever      | ----> |    Generator      |
                |                   |       |                   |       |                   |
                +-------------------+       +-------------------+       +-------------------+
                        ^                         |                           |
                        |                         v                           v
                        |                 +-------------------+       +-------------------+
                        +-----------------|  Knowledge Base   |       |  Final Response   |
                                          |                   |       |                   |
                                          +-------------------+       +-------------------+
                

Key Characteristics

Data Flow

  1. User query is received and tokenized
  2. Query embedding is generated using basic encoder
  3. Top-k documents are retrieved from knowledge base
  4. Documents are concatenated with original query
  5. Language model generates final response

Implementation Example


from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration

# Initialize components
tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-base")
retriever = RagRetriever.from_pretrained(
    "facebook/rag-token-base",
    index_name="custom",
    passages_path="data/my_knowledge_base.tsv"
)
model = RagSequenceForGeneration.from_pretrained(
    "facebook/rag-token-base",
    retriever=retriever
)

def naive_rag(query):
    # Tokenize input
    inputs = tokenizer(query, return_tensors="pt")
    
    # Generate response
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"]
    )
    
    # Decode and return response
    return tokenizer.decode(outputs[0], skip_special_tokens=True)
            

When to Use

Limitations