Lesson 611 lessons

Query Optimization and Retrieval Quality

Query rewriting for better retrieval

A user's raw question is sometimes a poor search query (too short, ambiguous, or conversational). Use the LLM itself to rewrite the query into a clearer, more search-optimized form before embedding it.

Choosing the right top-K

Retrieving too few chunks (K=1-2) risks missing relevant information; too many (K=20+) dilutes the prompt with noise and increases cost. Start with K=3-5 and tune based on observed answer quality.

Re-ranking retrieved results

After initial vector retrieval, an optional re-ranking step (using a specialized re-ranker model) can reorder results by true relevance, since vector similarity alone doesn't always perfectly match relevance for the specific question asked.

Key Takeaways

Rewrite ambiguous user queries into clearer search queries before embedding.
Tune top-K — too few misses information, too many adds noise and cost.
Re-ranking can improve relevance beyond raw vector similarity.
Retrieval quality directly determines final answer quality.

Tune top-K on a real query

Run the same query against your vector store with K=2, K=5, and K=10, and compare which produces the best final answer quality.

Take Lesson Exam