RAG — Retrieval-Augmented Generation — is the pattern of fetching relevant information before generating a response. It's how you give an LLM access to knowledge it wasn't trained on: your meeting notes, client docs, internal wikis.
Why RAG
The LLM has a knowledge cutoff. It doesn't know about your clients, your meetings, or your internal processes. RAG bridges this gap:
User query: "What did we agree on pricing with Comfama?"
Without RAG: "I don't have information about your Comfama pricing."
With RAG:
1. Search meeting notes for "Comfama" + "pricing"
2. Retrieve relevant chunks from 3 meetings
3. Generate: "In the Dec 12 meeting, you agreed on..."
You already use RAG constantly. When Claude Code reads your knowledge/pricing.md before answering a pricing question, that's manual RAG. When it searches Obsidian for meeting notes, that's RAG with grep.
The RAG Pipeline
Query → Retrieval → Augmentation → Generation
│ │ │
Find relevant Add to the Generate with
documents prompt full context
Retrieval methods:
Keyword search (BM25): Exact word matching. Fast, predictable, fails on synonyms.
Vector search: Convert query and documents to embeddings, find nearest neighbors. Handles semantic similarity but can be noisy.
Hybrid: Combine both. Keyword for precision, vectors for recall. Best of both worlds.
When RAG is overkill:
Your data fits in the context window (< 100K tokens). Just include it directly.
The data is static and small. Put it in CLAUDE.md or knowledge/.
The model already knows the answer (general knowledge).
Your Obsidian vault has 279+ meeting notes. That's too large for the context window but perfect for RAG — retrieve only the relevant meetings per query.
Test yourself
❓ Quiz 1
When would you NOT use RAG?
If the data fits in context, just include it. RAG adds complexity. Use it when data is too large for the context window or when you need dynamic retrieval.
Answer to continue ↓
Retrieval Quality Matters More Than Generation
A common mistake: obsessing over the generation prompt while the retriever returns garbage. If you retrieve the wrong documents, no amount of prompt engineering fixes the output.
The rule: 70% of RAG quality comes from retrieval. Get the right chunks in front of the model, and generation usually takes care of itself.
Test yourself
❓ Quiz 2
What accounts for most of RAG quality?
70% of RAG quality comes from retrieval. Wrong documents in = wrong answer out, regardless of prompt quality.
Answer to continue ↓
Apply this
⚖ Decision 1
You're building RAG over your 279 Obsidian meeting notes. Typical queries: 'what did we discuss about pricing with Comfama?' and 'what were the action items from last week?' Which retrieval strategy?
ASemantic search handles 'what did we discuss about pricing?' well, but searching for 'Comfama' by embedding is unreliable — proper nouns don't embed predictably.
B ★Best of both: BM25 catches exact matches (Comfama, 2026-03-15) while vectors find conceptual matches (pricing strategy ≈ fee structure). Worth the extra complexity for your use case.
CFast for known terms but fails on paraphrased queries. 'pricing discussion' won't match 'fee structure conversation.' Too brittle for real use.
Make your choice to continue ↓
Review
Time to consolidate what you learned.
🛠 Exercise 1
Map your Obsidian vault as a RAG system. What's the corpus (what types of documents)? What are the 5 most common queries you'd want to run against it? For each query, which retrieval method (keyword, vector, hybrid) would work best and why?