Lesson 611 lessons

Query Optimization and Retrieval Quality

Query rewriting for better retrieval

A user's raw question is sometimes a poor search query (too short, ambiguous, or conversational). Use the LLM itself to rewrite the query into a clearer, more search-optimized form before embedding it.

Choosing the right top-K

Retrieving too few chunks (K=1-2) risks missing relevant information; too many (K=20+) dilutes the prompt with noise and increases cost. Start with K=3-5 and tune based on observed answer quality.

Re-ranking retrieved results

After initial vector retrieval, an optional re-ranking step (using a specialized re-ranker model) can reorder results by true relevance, since vector similarity alone doesn't always perfectly match relevance for the specific question asked.

Key Takeaways

  • Rewrite ambiguous user queries into clearer search queries before embedding.
  • Tune top-K — too few misses information, too many adds noise and cost.
  • Re-ranking can improve relevance beyond raw vector similarity.
  • Retrieval quality directly determines final answer quality.

Tune top-K on a real query

Run the same query against your vector store with K=2, K=5, and K=10, and compare which produces the best final answer quality.