Chunking and Embeddings

Retrieval starts with how you break documents into pieces (chunking) and how you represent those pieces numerically (embeddings). These decisions determine what your RAG system can and can't find.

Chunking Strategies

Documents need to be split into chunks small enough to be relevant but large enough to contain useful context.

By headers/sections

Split at H2 or H3 boundaries. Each section becomes a chunk. Works great for structured documents like your meeting notes (which have headers for topics, action items, etc.).

Fixed-size with overlap

Split every N tokens with M tokens of overlap. Simple, works for any document. But you'll cut through sentences and ideas.

Semantic chunking

Use the LLM or embeddings to identify topic boundaries. More accurate but slower and more expensive.

For your Obsidian vault:

Your meeting notes already have structure — YAML frontmatter, headers, sections. Chunking by section (H2) is the obvious choice. Each chunk gets metadata from the frontmatter (date, project, attendees).

Meeting: "2025-12-12 Pricing strategy Comfama"

Chunk 1: { section: "Discussion", content: "...", date: "2025-12-12", project: "Comfama" }
Chunk 2: { section: "Action Items", content: "...", date: "2025-12-12", project: "Comfama" }
Chunk 3: { section: "Decisions", content: "...", date: "2025-12-12", project: "Comfama" }

Embeddings

Embeddings convert text into vectors — arrays of numbers that capture semantic meaning. Similar texts have similar vectors.

Popular embedding models:

text-embedding-3-small (OpenAI): Cheap, good enough for most cases, 1536 dimensions
text-embedding-3-large (OpenAI): Better quality, 3072 dimensions, 2x cost
voyage-3 (Voyage AI): Strong on code and technical content
Local models: nomic-embed-text, bge-small — free, runs on your machine

Vector stores:

Supabase pgvector: You already use Supabase. Enable the pgvector extension, add an embedding column, done. Zero new infrastructure.
Pinecone: Managed, fast, scales well. Extra service to maintain.
ChromaDB: Local, good for prototyping. Not great for production.

For your stack, Supabase pgvector is the obvious choice — no new service, you already have the client set up.

The Chunk Size Tradeoff

Small chunks (100-200 tokens)	Large chunks (500-1000 tokens)
More precise retrieval	More context per chunk
Risk losing surrounding context	Risk retrieving irrelevant text
More chunks to search	Fewer chunks, faster search
Better for specific facts	Better for narrative/reasoning

For meeting notes: 300-500 tokens per chunk is a good starting point. Each section of a meeting typically falls in this range naturally.

❓ Quiz 1

For your Obsidian meeting notes (which have YAML frontmatter and H2 sections), what's the best chunking strategy?

Your notes are already well-structured. Chunking by sections preserves natural topic boundaries, and the frontmatter gives you metadata (date, project, attendees) for filtering.

Answer to continue ↓

Review

Time to consolidate what you learned.

🎮 Chunking Strategy Picker

Click each item to move it between categories. Place all items, then click Check.

Meeting notes with H2 sections

Obsidian vault entries

Mixed-format wikis

Research papers

Raw chat logs

API docs with structured headers

Legal contracts with nested clauses

CSV data dumps

Long unstructured emails

By Headers

Fixed-Size

Semantic

Complete to continue ↓

🛠 Exercise 1

Write a TypeScript function that takes a markdown meeting note (string) and returns an array of chunks. Each chunk should be an object with: content (the section text), section (the H2 header name), and metadata extracted from YAML frontmatter (date, project, attendees if present). Use gray-matter for frontmatter parsing.

✓ Saved

← Retrieval Fundamentals Evaluation and Iteration →

→ advance · ? shortcuts 03.02