By Stephen DeAngelis
News stories about large language models (LLMs) have often highlighted the outrageous behavior or false claims such models have generated. One reason that LLMs have occasionally gone off-the-rails is that they are rich in general knowledge but lack context. To overcome this pitfall, companies are now using retrieval-augmented generation (RAG), which enhances LLMs with enterprise data. The term was coined by a team of Meta Platforms AI researchers led by Patrick Lewis in a 2020 paper entitled “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” In an interview with retired journalist Rick Merritt, Lewis “apologized for the unflattering acronym that now describes a growing family of methods across hundreds of papers and dozens of commercial services he believes represent the future of generative AI.”[1] He told Merritt, “We definitely would have put more thought into the name had we known our work would become so widespread. We always planned to have a nicer sounding name, but when it came time to write the paper, no one had a better idea.”
What is Retrieval-Augmented Generation?
The staff at Amazon Web Services explains, “Retrieval-Augmented Generation is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response. Large Language Models are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences. RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.”[2]
The staff at Google Cloud explains RAG in a similar manner. They write, “RAG is an AI framework that combines the strengths of traditional information retrieval systems (such as search and databases) with the capabilities of generative large language models. By combining your data and world knowledge with LLM language skills, grounded generation is more accurate, up-to-date, and relevant to your specific needs.”[3]
Merritt compares RAG to the legal staff supporting a judge. He writes, “Judges hear and decide cases based on their general understanding of the law. Sometimes a case — like a malpractice suit or a labor dispute — requires special expertise, so judges send court clerks to a law library looking for precedents and specific cases they can cite. Like a good judge, large language models can respond to a wide variety of human queries. But to deliver authoritative answers — grounded in specific court proceedings or similar ones — the model needs to be provided that information. The court clerk of AI is a process called retrieval-augmented generation, or RAG for short.” In other words, LLMs are most useful when they have context.
The Benefits of RAG
Journalist Steven Rosenbush observes that the public is often “bewildered” at the enthusiasm companies are showing for generative AI.[4] The public, he notes, only sees consumer-facing chatbots. What they don’t see, he writes, is “behind the scenes, a lot of companies are deploying more specialized, internal AI tools that tap their data — and that is where they are looking for the real payoff from AI.” How big is the movement towards RAG-aided generative AI. According to Sylvain Duranton, global leader of BCG X, “It’s massive.” He told Rosenbush, “Most of what we do [for large corporations] is RAG-based.”
The most obvious benefit of RAG is succinctly stated by the IBM staff, “RAG helps large language models deliver more relevant responses at a higher quality.”[5] They go on to note, “RAG empowers organizations to avoid high retraining costs when adapting generative AI models to domain-specific use cases. Enterprises can use RAG to complete gaps in a machine learning model’s knowledge base so it can provide better answers. The primary benefits of RAG include: Cost-efficient AI implementation and AI scaling; access to current domain-specific data; lower risk of AI hallucinations; increased user trust; expanded use cases; enhanced developer control and model maintenance; [and] greater data security.” The Google Cloud staff adds, “Providing ‘facts’ to the LLM as part of the input prompt can mitigate ‘gen AI hallucinations.’ The crux of this approach is ensuring that the most relevant facts are provided to the LLM, and that the LLM output is entirely grounded on those facts while also answering the user’s question and adhering to system instructions and safety constraints.”
RAG also enhances prompt engineering efforts (i.e., the process of asking questions and giving instructions to a large language model). Journalist Belle Lin explains, “Prompt engineering has emerged … as an essential new skill for employees, so that they can generate better text summaries, data analyses and email drafts from AI chatbots and other applications. It is also used as a way to provide general large language models with specific company information, so that it provides more tailored responses.”[6] The IBM staff notes, “RAG systems essentially enable users to query databases with conversational language.” Because RAG systems are user-friendly, they can be leveraged in a number of ways. The IBM staff explains, “The data-powered question-answering abilities of RAG systems have been applied across a range of use cases, including: Specialized chatbots and virtual assistants; research; content generation; market analysis and product development; knowledge engines; [and] recommendation services.”
Concluding Thoughts
Author and keynote speaker, Dean DeBiase, explains, “For many companies, LLMs are still the best choice for specific projects. For others, though, they can be expensive for businesses to run, as measured in dollars, energy, and computing resources. … I suspect there are emerging alternatives that will work better in certain instances — and my discussions with dozens of CEOs support that [prediction].” RAG has certainly made LLMs more useful. Although as DeBiase notes, other alternatives, like small language models, may achieve similar results at less cost. As I noted in a LinkedIn post, “There's often a misconception that small language models (SLMs) are less effective than large language models (LLMs) — but the reality is, each serves different functions/needs based on the source data and information they use and the trustworthiness they bring.” RAG was designed to improve trust for users of LLMs.
Despite the many benefits of RAG, Varun Raj, a cloud and AI engineering executive, offers a word of caution. He writes, “Many organizations are discovering that retrieval is no longer a feature bolted onto model inference — it has become a foundational system dependency. Once AI systems are deployed to support decision-making, automate workflows or operate semi-autonomously, failures in retrieval propagate directly into business risk. Stale context, ungoverned access paths and poorly evaluated retrieval pipelines do not merely degrade answer quality; they undermine trust, compliance and operational reliability.”[8] He argues that freshness, governance, and evaluation of data must be built into systems employing RAG. He explains, “Freshness, governance and evaluation are not optional optimizations; they are prerequisites for deploying AI systems that operate reliably in real-world environments. As organizations push beyond experimental RAG deployments toward autonomous and decision-support systems, the architectural treatment of retrieval will increasingly determine success or failure.”
Footnotes
[1] Rick Merritt, “What Is Retrieval-Augmented Generation, aka RAG?” Nvidia Blog, 31 January 2025.
[2] Staff, “What is RAG (Retrieval-Augmented Generation)?” Amazon Web Services.
[3] Staff, “What is Retrieval-Augmented Generation (RAG)?” Google Cloud.
[4] Steven Rosenbush, “Companies Look Past Chatbots for AI Payoff,” The Wall Street Journal, 23 October 2024.
[5] Staff, “What is retrieval augmented generation (RAG)?” IBM.
[6] Belle Lin, “From RAGs to Vectors: How Businesses Are Customizing AI Models,” The Wall Street Journal, 21 May 2024.
[7] Dean DeBiase, “Why Small Language Models Are The Next Big Thing In AI,” Forbes, 25 November 2024.
[8] Varun Raj, “Enterprises are measuring the wrong part of RAG,” Venture Beat, 1 February 2026.





