Advanced RAG
Master 11 advanced RAG strategies — from re-ranking and semantic chunking to knowledge graphs, agentic retrieval, and fine-tuned embeddings.
Why take this course?
You know the basics — now master the advanced techniques production systems use to retrieve the right context every time. 11 strategies across 6 modules.
Prerequisites
This course builds on concepts from the following courses. It is recommended to complete them first:
Course Modules
Improve retrieval quality with re-ranking, query expansion, and multi-query strategies that get better results from the same vector store.
Learning Goals
- Explain how cross-encoder re-ranking improves two-stage retrieval over raw similarity search
- Use query expansion to let an LLM rewrite user queries for better embedding matches
- Apply multi-query RAG to generate parallel query variations for broader recall
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
The Retrieval Quality Problem
Nina's RAG pipeline works — sometimes. She chunked her docs, embedded them, and wired up vector search. Then a user asks…
Re-ranking with Cross-Encoders
Re-ranking is a two-stage retrieval strategy. Stage one casts a wide net. Stage two scores each result precisely.
**Sta…
Query Expansion
Users write bad queries — not because they are bad at searching, but because they don't know the vocabulary of your corp…
Move beyond fixed-size splits with context-aware chunking, late chunking, and hierarchical parent-child chunk relationships.
Learning Goals
- Compare context-aware (semantic) chunking against fixed-size splitting and explain when each is appropriate
- Describe late chunking — embedding full documents first, then splitting token embeddings
- Implement hierarchical RAG with parent-child chunk relationships: search small, return big
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
Why Chunking Strategy Matters
Most RAG tutorials gloss over chunking. "Just split every 500 tokens with 50-token overlap." It works well enough for de…
Context-Aware (Semantic) Chunking
Instead of splitting by token count, split by meaning.
Semantic chunking uses embedding similarity between consecut…
Late Chunking
Standard chunking has a fundamental information loss problem. When you split a document and embed each chunk independent…
Use an LLM at ingestion time to generate chunk descriptions that dramatically improve retrieval relevance.
Learning Goals
- Explain contextual retrieval — prepending LLM-generated descriptions to chunks before embedding
- Evaluate the cost-quality tradeoff of LLM-enriched ingestion pipelines
- Design an ingestion pipeline that adds contextual metadata without excessive latency
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
The Lost Context Problem
Pull a random chunk from your vector store and read it in isolation. Does it make sense on its own?
Often, no. "The sys…
Contextual Retrieval
Contextual retrieval is simple: use an LLM to prepend a short description to each chunk before embedding it. The des…
The Enriched Ingestion Pipeline
Contextual retrieval shifts work from query time to ingestion time. You pay the cost once per chunk, but benefit on ever…
Combine vector similarity with graph-based entity relationships for retrieval that understands structure, not just semantics.
Learning Goals
- Describe how knowledge graphs capture entity relationships that vector search misses
- Combine vector search with graph traversal for hybrid retrieval
- Identify use cases where graph-augmented RAG significantly outperforms pure vector search
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
What Vector Search Misses
Vector search finds content that is semantically similar. But similarity isn't the only relationship that matters.
Nina…
Knowledge Graphs for RAG
A knowledge graph represents your domain as entities (nodes) and relationships (edges). Entities are things — se…
Hybrid Vector + Graph Retrieval
In practice, you combine both retrieval paths, not replace one with the other.
Hybrid retrieval runs vector and gra…
Build RAG systems that choose their own retrieval strategy and self-correct when results are poor.
Learning Goals
- Design agentic RAG where an agent selects the retrieval strategy per query
- Implement self-reflective RAG — the LLM grades retrieved chunks and re-searches on low relevance
- Compare agentic vs self-reflective approaches and when to combine them
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
Beyond Static Pipelines
Every RAG pipeline so far follows the same pattern: query in → retrieval runs → results to LLM → answer out. The same st…
Agentic RAG
In agentic RAG, the LLM is not just the answer generator — it's the retrieval strategist.
Instead of a fixed pipeli…
Self-Reflective RAG
Agentic RAG chooses the right strategy. But what if the strategy was right and the results were still bad?
**Self-refle…
Train domain-specific embedding models to squeeze 5-10% more accuracy out of your retrieval pipeline.
Learning Goals
- Explain why general-purpose embeddings underperform on specialized domains
- Outline the process for fine-tuning an embedding model on domain-specific data
- Evaluate when the 5-10% accuracy gain from fine-tuned embeddings justifies the training cost
Concept Card Preview
Visuals, diagrams, and micro-interactions you'll see in this module.
The Generic Embedding Problem
General-purpose embedding models are trained on broad web text. They understand everyday language — but struggle with sp…
Fine-tuning Embedding Models
Fine-tuning teaches the embedding model your domain's language. After training, "MI" in a medical corpus embeds close to…

The 5-10% Accuracy Gain
Benchmarks show fine-tuned embeddings improve domain-specific retrieval by 5-10% on recall@k and MRR. Sounds modest…