Why Chunking is Broken in RAG (And We Still Use It)

The Uncomfortable Truth About Chunking

I build enterprise-grade RAG solutions. Every system I deploy uses chunking. And every system suffers because of it.

Chunking—splitting documents into smaller pieces for vector search—is the foundation of modern retrieval. It’s also fundamentally broken. Here’s why we have a problem.

The Seven Deadly Sins of Chunking

1. Adjacent Context Vanishes

Once you chunk a document, vector lookups become isolated searches. You find the relevant chunk, but you lose what came before and after.

The problem: Critical context lives in adjacent chunks. A definition in paragraph 1 explains the concept in paragraph 3. A table header three chunks up makes the data rows make sense.

The band-aid: Add document summaries to each chunk. Now you have redundancy, larger chunks, and you still don’t have the exact adjacent content.

2. Automated Chunking is Arbitrary

Most chunking strategies use token counts (512 tokens, 1000 tokens) or sentence boundaries. Documents don’t respect these artificial boundaries.

What happens:

A code example gets split mid-function
A list of requirements gets severed between items 3 and 4
A multi-paragraph argument loses its conclusion

The reality: There’s no universal chunk size. Legal contracts need different splits than API documentation. Product specs need different boundaries than support tickets.

3. Manual Chunking Doesn’t Scale

The alternative—having humans decide chunk boundaries—works beautifully for 100 documents. It collapses at 100,000.

The math:

10 minutes per document to review and mark boundaries
100,000 documents = 1,000,000 minutes = 16,667 hours = 8+ years of work

Even with domain experts, manual chunking is a non-starter for production systems.

4. Second and Third-Order Insights Disappear

Documents contain multiple layers of meaning:

Level 1: Direct answers (what the text explicitly states)
Level 2: Implied relationships (connections between sections)
Level 3: Structural insights (what the table of contents reveals, what’s emphasized, what’s omitted)

Chunking preserves Level 1. It destroys Level 2 and Level 3.

Example: A security policy document mentions “encryption” 47 times across 12 sections, but only discusses key rotation in an appendix. The frequency + placement tells you something. Isolated chunks don’t.

5. Single-Step Lookup is Shallow

Simple questions work fine:

“What is the return policy?” → Find the return policy chunk → Done

Complex questions break:

“How does the return policy interact with warranty claims for defective products purchased with a coupon?”

This needs:

Return policy chunk
Warranty terms chunk
Promotion rules chunk
Defect definition chunk
Potentially case studies or examples

Single-step vector search finds the closest match. It doesn’t assemble multi-faceted answers.

6. Relationships Are Severed

Documents encode relationships:

This policy overrides that guideline
Section 4.2 refers back to Section 2.1
The Q3 report updates the Q2 projections
Method B is the recommended approach; Method A is deprecated

When you chunk, these connections vanish. Each chunk becomes an isolated fact. You lose the graph of how information relates.

7. Metadata is Inconsistent

Even when chunks preserve metadata (document ID, section, timestamp), they don’t capture:

The chunk’s role in the larger argument
Whether this is the authoritative statement or a counterexample
If this content has been superseded
The confidence level or source quality

You get chunks with tags. You don’t get chunks with meaning.

Why We’re Stuck With It

If chunking is so broken, why does everyone use it?

Because the alternatives are worse:

Embed entire documents: Doesn’t work beyond small corpora. A 50-page document embedded as one vector loses granularity.
Hierarchical summaries: Help with recall, but add latency and still rely on chunking at the leaf level.
Graph-based retrieval: Great for entity relationships, terrible for prose. Building accurate knowledge graphs requires structured data or massive overhead.
Semantic parsing: Parse documents into meaning units instead of token windows. Theoretically elegant, practically fragile. Every document type needs custom parsers.

Chunking is the least-bad option. It’s simple, it’s fast, it works “well enough” for basic retrieval.

What Actually Helps (Incrementally)

Since we’re stuck with chunking, here’s what reduces the pain:

Overlap Aggressively

50% overlap between chunks. Yes, it doubles storage. It also captures boundary context.

Store Chunk Neighbors

Metadata should include prev_chunk_id and next_chunk_id. When you retrieve chunk N, automatically pull N-1 and N+1.

Hybrid Retrieval

BM25/sparse + dense vectors. Lexical search finds exact terms; vectors find semantic similarity. Rerank with a cross-encoder.

Rich Metadata

Every chunk needs:

doc_id, section_id, chunk_index
doc_type, source, timestamp, version
parent_summary (one-sentence doc summary)
context_before, context_after (titles/headers from surrounding chunks)

Multi-Stage Retrieval

First pass: retrieve 100 candidates (dense + sparse)
Rerank: cross-encoder narrows to top 20
Expansion: fetch adjacent chunks for top 5
Assembly: dedupe, diversify, sequence by document order

Accept the Limits

Chunking works for:

FAQ retrieval
Policy lookups
Code snippet search
Simple question answering

Chunking fails for:

Multi-step reasoning
Comparative analysis
Causal explanations
Nuanced interpretation

Use RAG for the former. For the latter, you need agentic workflows, graph traversal, or long-context LLMs processing full documents.

The Real Problem

Chunking isn’t broken because it’s a bad technique. It’s broken because we’re asking it to do something it was never designed for: preserve meaning while destroying structure.

Vector embeddings represent semantic similarity in a continuous space. Chunks are discrete, arbitrary fragments. We’re trying to make discontinuous pieces behave like continuous meaning. It’s a category error.

The future isn’t better chunking. It’s moving past chunking entirely—toward models that can natively process long contexts, graph-augmented retrieval that preserves relationships, and semantic parsing that respects document structure.

Until then, we chunk. We add summaries, overlap, metadata, reranking, and adjacent retrieval. We make it work.

But let’s not pretend it’s not broken.

What are you seeing? If you’re building RAG systems, what chunking strategies actually work for you? Where does it fail? Let’s compare notes.