Retrieval-Augmented Generation (RAG)
: An AI architecture pattern that enhances LLM responses by retrieving relevant documents from a knowledge base before generating an answer. RAG combines information retrieval (searching a vector database) with text generation (LLM), grounding the model’s output in specific, up-to-date, and domain-specific information.
Why It Matters for AI-Coded Apps
RAG is the most practical way to give LLMs access to private or current data without fine-tuning. However, RAG introduces security considerations: the retrieved documents may contain sensitive information, prompt injection can manipulate retrieval, and poor chunking leads to inaccurate or misleading responses.
Real-World Example
A support chatbot uses RAG: 1) User asks ‘How do I reset my password?’ 2) The system embeds the question and searches the vector database for similar documentation chunks. 3) Top-k relevant chunks are retrieved and added to the LLM’s context. 4) The LLM generates an answer grounded in the actual documentation.
How to Detect and Prevent It
Implement access controls on the retrieval layer – users should only retrieve documents they’re authorized to see. Sanitize retrieved content before passing to the LLM to prevent indirect prompt injection. Monitor for data exfiltration through RAG responses. Use proper chunking strategies to ensure accurate retrieval.
Frequently Asked Questions
What is the difference between RAG and fine-tuning?
RAG retrieves external knowledge at query time without modifying the model. Fine-tuning permanently adjusts model weights with new training data. RAG is better for frequently changing data and specific document Q&A. Fine-tuning is better for teaching new skills or changing the model’s behavior.
What vector databases work best for RAG?
Popular options: Pinecone (managed, easy setup), Weaviate (open source, full-featured), Chroma (lightweight, local development), Qdrant (high performance, open source), pgvector (PostgreSQL extension, good for existing Postgres users). Choice depends on scale, hosting preference, and existing infrastructure.
How do I prevent prompt injection through RAG?
Sanitize documents before indexing. Tag retrieved content as untrusted data in the prompt. Use separate system instructions that the LLM prioritizes over retrieved content. Monitor for suspicious retrieval patterns. Implement output filtering to catch leaked instructions or data.
Scan your app for security issues automatically
Vibe Eval checks for 200+ vulnerabilities in AI-generated code.
Try Vibe Eval