Query Rewriting Pattern

Overview

Query Rewriting improves retrieval by transforming queries:

  1. Query expansion with synonyms and related terms
  2. Query simplification and normalization
  3. Spelling correction and typo handling
  4. Contextual query refinement

Implementation Example


from transformers import T5ForConditionalGeneration, T5Tokenizer
import spacy

class QueryRewriter:
    def __init__(self):
        self.tokenizer = T5Tokenizer.from_pretrained("t5-small")
        self.model = T5ForConditionalGeneration.from_pretrained("t5-small")
        self.nlp = spacy.load("en_core_web_sm")
        
    def rewrite(self, query):
        # Basic preprocessing
        doc = self.nlp(query)
        normalized = " ".join([token.lemma_ for token in doc])
        
        # Generate rewritten queries
        input_text = f"rewrite query: {normalized}"
        input_ids = self.tokenizer(input_text, return_tensors="pt").input_ids
        
        outputs = self.model.generate(
            input_ids,
            max_length=50,
            num_return_sequences=3,
            num_beams=5
        )
        
        return [self.tokenizer.decode(output, skip_special_tokens=True)
               for output in outputs]

# Usage
rewriter = QueryRewriter()
rewritten_queries = rewriter.rewrite("How does RAG work?")
            

When to Use