Overview
Query Rewriting improves retrieval by transforming queries:
- Query expansion with synonyms and related terms
- Query simplification and normalization
- Spelling correction and typo handling
- Contextual query refinement
Implementation Example
from transformers import T5ForConditionalGeneration, T5Tokenizer
import spacy
class QueryRewriter:
def __init__(self):
self.tokenizer = T5Tokenizer.from_pretrained("t5-small")
self.model = T5ForConditionalGeneration.from_pretrained("t5-small")
self.nlp = spacy.load("en_core_web_sm")
def rewrite(self, query):
# Basic preprocessing
doc = self.nlp(query)
normalized = " ".join([token.lemma_ for token in doc])
# Generate rewritten queries
input_text = f"rewrite query: {normalized}"
input_ids = self.tokenizer(input_text, return_tensors="pt").input_ids
outputs = self.model.generate(
input_ids,
max_length=50,
num_return_sequences=3,
num_beams=5
)
return [self.tokenizer.decode(output, skip_special_tokens=True)
for output in outputs]
# Usage
rewriter = QueryRewriter()
rewritten_queries = rewriter.rewrite("How does RAG work?")
When to Use
- When dealing with ambiguous or vague queries
- For improving recall in information retrieval
- When users may not know the exact terminology
- For handling spelling mistakes and typos