← Back to Course

Basic Theory

๐Ÿ‡บ๐Ÿ‡ฆ ะฃะบั€ะฐั—ะฝััŒะบะฐ
๐Ÿš€ Level 4 โ€” Master

RAG (Retrieval-Augmented Generation)

Grounding AI responses in your own data using retrieval techniques.

RAG is the most practical technique for making AI work with your own data. Instead of fine-tuning a model on your documents (expensive and inflexible), RAG retrieves relevant information at query time and includes it in the prompt. The model then generates answers grounded in your actual data rather than its training knowledge.

A typical RAG pipeline: embed your documents into vectors, store in a vector database, and at query time retrieve the most relevant chunks to include in the context. This pattern powers knowledge bases, customer support bots, code assistants, and enterprise search. Getting the retrieval right is 80% of the challenge.

Key Topics Covered
RAG Architecture
Three phases: Retrieve (find relevant documents), Augment (add them to the prompt), Generate (LLM produces grounded answer). Simple in concept, nuanced in execution.
Embeddings
Dense vector representations of text that capture semantic meaning. Similar texts have similar vectors. Models: OpenAI text-embedding-3, Cohere embed-v3, open-source BGE and E5.
Vector Databases
Specialized databases for storing and querying embeddings. Pinecone (managed), Qdrant (open-source), Weaviate, ChromaDB (lightweight). Each optimizes for different scale and feature needs.
Chunking Strategies
How you split documents into chunks dramatically affects retrieval quality. Fixed-size, sentence-based, semantic, recursive, and document-structure-aware chunking each suit different content types.
Hybrid Search
Combining semantic search (embeddings) with keyword search (BM25). Hybrid catches both conceptually similar and keyword-exact matches. Most production RAG systems use hybrid search.
Reranking
After initial retrieval, a cross-encoder reranker scores each chunk against the query more accurately. Cohere Rerank, BGE reranker. Dramatically improves retrieval precision.
Advanced RAG Patterns
CRAG (Corrective RAG): verify retrieval quality before generating. Self-RAG: model decides when retrieval is needed. Graph RAG: combine vector search with knowledge graphs for richer context.
Multi-Modal RAG
RAG beyond text โ€” retrieving images, tables, and code snippets. Vision models can process retrieved images. Table extraction and code understanding require specialized chunking.
Evaluation
Measuring RAG quality: retrieval metrics (precision, recall, MRR) and generation metrics (faithfulness, relevance, completeness). RAGAS framework automates RAG evaluation.
Common Pitfalls
Too-small chunks lose context, too-large waste tokens. Poor embeddings retrieve irrelevant content. No reranking means noise in the top results. Always evaluate retrieval quality independently of generation.
Key Terms
EmbeddingDense vector representation of text that captures semantic meaning for similarity search.
Vector DatabaseDatabase optimized for storing embeddings and performing fast similarity search (Pinecone, Qdrant, Weaviate).
ChunkingThe process of splitting documents into smaller pieces for embedding and retrieval.
RerankingSecond-stage scoring of retrieved results using a cross-encoder model for improved precision.
Practical Tips
Related Community Discussions