Naive RAG Pattern

Overview

The Naive RAG pattern establishes the fundamental architecture for Retrieval-Augmented Generation systems. This pattern serves as the baseline for understanding more complex RAG implementations.

Architectural Diagram

                +-------------------+       +-------------------+       +-------------------+
                |                   |       |                   |       |                   |
                |    User Query     | ----> |    Retriever      | ----> |    Generator      |
                |                   |       |                   |       |                   |
                +-------------------+       +-------------------+       +-------------------+
                        ^                         |                           |
                        |                         v                           v
                        |                 +-------------------+       +-------------------+
                        +-----------------|  Knowledge Base   |       |  Final Response   |
                                          |                   |       |                   |
                                          +-------------------+       +-------------------+

Key Characteristics

Retrieval: Basic vector similarity search using cosine distance
Processing: Minimal document preprocessing and filtering
Generation: Direct concatenation of query and documents
Knowledge Base: Flat document structure with basic indexing

Data Flow

User query is received and tokenized
Query embedding is generated using basic encoder
Top-k documents are retrieved from knowledge base
Documents are concatenated with original query
Language model generates final response

Implementation Example


from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration

# Initialize components
tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-base")
retriever = RagRetriever.from_pretrained(
    "facebook/rag-token-base",
    index_name="custom",
    passages_path="data/my_knowledge_base.tsv"
)
model = RagSequenceForGeneration.from_pretrained(
    "facebook/rag-token-base",
    retriever=retriever
)

def naive_rag(query):
    # Tokenize input
    inputs = tokenizer(query, return_tensors="pt")
    
    # Generate response
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"]
    )
    
    # Decode and return response
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

When to Use

Simple Q&A systems with limited complexity
When computational resources are constrained
For rapid prototyping and initial implementations
When working with small, well-structured document collections
For educational purposes to understand RAG fundamentals

Limitations

Limited ability to handle complex queries
No query expansion or refinement
Basic retrieval without context awareness
No document re-ranking or filtering