Query Optimization and Retrieval Quality
Query rewriting for better retrieval
A user's raw question is sometimes a poor search query (too short, ambiguous, or conversational). Use the LLM itself to rewrite the query into a clearer, more search-optimized form before embedding it.
Choosing the right top-K
Retrieving too few chunks (K=1-2) risks missing relevant information; too many (K=20+) dilutes the prompt with noise and increases cost. Start with K=3-5 and tune based on observed answer quality.
Re-ranking retrieved results
After initial vector retrieval, an optional re-ranking step (using a specialized re-ranker model) can reorder results by true relevance, since vector similarity alone doesn't always perfectly match relevance for the specific question asked.
Key Takeaways
- Rewrite ambiguous user queries into clearer search queries before embedding.
- Tune top-K — too few misses information, too many adds noise and cost.
- Re-ranking can improve relevance beyond raw vector similarity.
- Retrieval quality directly determines final answer quality.
Tune top-K on a real query
Run the same query against your vector store with K=2, K=5, and K=10, and compare which produces the best final answer quality.