What is RAG? โ Retrieval Augmented Generation
The problem RAG solves
A model's knowledge is frozen at training time and doesn't include your private documents, latest data, or company-specific information. RAG lets the model look up relevant information from your own knowledge base before answering.
The basic RAG flow
1. A user asks a question. 2. The system searches your knowledge base for relevant chunks of text. 3. Those chunks are inserted into the prompt as context. 4. The model answers using both its training and this retrieved context.
Why RAG beats fine-tuning for most use cases
RAG lets you update your knowledge base instantly (just add a document) without retraining a model, and it lets the AI cite exactly which document an answer came from โ fine-tuning bakes knowledge in permanently and opaquely.
Key Takeaways
- RAG lets a model answer using your own documents, not just its training data.
- The flow is: question โ retrieve relevant chunks โ inject as context โ answer.
- RAG allows instant knowledge updates without retraining a model.
- RAG enables citing exact sources, unlike opaque fine-tuning.
Trace a manual RAG flow
Pick a document and a question about it. Manually find the relevant paragraph, then write a prompt that includes that paragraph as context before the question.