Retrieval-Augmented Generation (RAG)
Why it matters
RAG is the single most adopted pattern for reducing hallucination in production LLM deployments. It does two useful things at once. It moves the source of truth from “what the model happened to remember during training” to “what is in your indexed corpus right now,” which means you control what the model can ground in. And it makes citation possible: because the retrieved chunks are explicitly part of the answer’s input, the system can show the user which sources backed the response. That visibility is what turns a chat into an answer engine.
Where you’ll encounter it
You will encounter RAG in three families: enterprise search (the model answers questions against your internal documents), customer-facing assistants (the model answers against the product knowledge base or policy library), and research / analyst tools (the model answers against a curated corpus of articles, papers, or filings). Common failure modes include retrieving irrelevant chunks (poor embeddings or stale index), retrieving too few chunks (model has nothing to ground on), retrieving too many chunks (context window overflow, model wanders), and chunk-quality drift over time (the index isn’t being refreshed).
Part of the 7wData AI Glossary. Tracking how concepts like this move in the expert conversation: daily signals at ins7ghts.com.