Hybrid Search Pattern

Overview

Hybrid Search combines multiple retrieval approaches:

  1. Keyword-based search (sparse retrieval)
  2. Semantic search (dense retrieval)
  3. Fusion of results using weighted scoring
  4. Dynamic adjustment of retrieval strategies

Implementation Example


from sklearn.feature_extraction.text import TfidfVectorizer
from sentence_transformers import SentenceTransformer
import numpy as np

class HybridRetriever:
    def __init__(self, documents):
        self.documents = documents
        self.tfidf = TfidfVectorizer()
        self.tfidf_matrix = self.tfidf.fit_transform(documents)
        self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
        self.embeddings = self.embedder.encode(documents)
        
    def search(self, query, alpha=0.7):
        # Sparse retrieval (TF-IDF)
        query_vec = self.tfidf.transform([query])
        sparse_scores = (query_vec * self.tfidf_matrix.T).toarray()[0]
        
        # Dense retrieval (embeddings)
        query_embedding = self.embedder.encode(query)
        dense_scores = cosine_similarity(
            [query_embedding],
            self.embeddings
        )[0]
        
        # Combine scores
        combined_scores = alpha * dense_scores + (1 - alpha) * sparse_scores
        top_indices = np.argsort(combined_scores)[::-1]
        
        return [(self.documents[i], combined_scores[i]) 
               for i in top_indices]

# Usage
documents = [...]  # Your document collection
retriever = HybridRetriever(documents)
results = retriever.search("RAG implementation")
            

When to Use