Technique

Retrieval-Augmented Generation (RAG)

RAG is the pattern where the language model is given relevant documents as context BEFORE it answers, so the answer is grounded in a specific, citable source rather than in the model's compressed training memory. The two-step shape is in the name: retrieve, then generate. Retrieval typically uses vector search over an indexed corpus; generation uses the retrieved chunks as the model's working context for that one query.
Reviewed by 7wData

Why it matters

RAG is the single most adopted pattern for reducing hallucination in production LLM deployments. It does two useful things at once. It moves the source of truth from “what the model happened to remember during training” to “what is in your indexed corpus right now,” which means you control what the model can ground in. And it makes citation possible: because the retrieved chunks are explicitly part of the answer’s input, the system can show the user which sources backed the response. That visibility is what turns a chat into an answer engine.

Where you’ll encounter it

You will encounter RAG in three families: enterprise search (the model answers questions against your internal documents), customer-facing assistants (the model answers against the product knowledge base or policy library), and research / analyst tools (the model answers against a curated corpus of articles, papers, or filings). Common failure modes include retrieving irrelevant chunks (poor embeddings or stale index), retrieving too few chunks (model has nothing to ground on), retrieving too many chunks (context window overflow, model wanders), and chunk-quality drift over time (the index isn’t being refreshed).


Part of the 7wData AI Glossary. Tracking how concepts like this move in the expert conversation: daily signals at ins7ghts.com.