The Uncomfortable Truth About Chunking
I build enterprise-grade RAG solutions. Every system I deploy uses chunking. And every system suffers because of it.
Chunking—splitting documents into smaller pieces for vector search—is the foundation of modern retrieval. It’s also fundamentally broken. Here’s why we have a problem.
The Seven Deadly Sins of Chunking
1. Adjacent Context Vanishes
Once you chunk a document, vector lookups become isolated searches. You find the relevant chunk, but you lose what came before and after.
The problem: Critical context lives in adjacent chunks. A definition in paragraph 1 explains the concept in paragraph 3. A table header three chunks up makes the data rows make sense.
The band-aid: Add document summaries to each chunk. Now you have redundancy, larger chunks, and you still don’t have the exact adjacent content.
2. Automated Chunking is Arbitrary
Most chunking strategies use token counts (512 tokens, 1000 tokens) or sentence boundaries. Documents don’t respect these artificial boundaries.
What happens:
- A code example gets split mid-function
- A list of requirements gets severed between items 3 and 4
- A multi-paragraph argument loses its conclusion
The reality: There’s no universal chunk size. Legal contracts need different splits than API documentation. Product specs need different boundaries than support tickets.
3. Manual Chunking Doesn’t Scale
The alternative—having humans decide chunk boundaries—works beautifully for 100 documents. It collapses at 100,000.
The math:
- 10 minutes per document to review and mark boundaries
- 100,000 documents = 1,000,000 minutes = 16,667 hours = 8+ years of work
Even with domain experts, manual chunking is a non-starter for production systems.
4. Second and Third-Order Insights Disappear
Documents contain multiple layers of meaning:
- Level 1: Direct answers (what the text explicitly states)
- Level 2: Implied relationships (connections between sections)
- Level 3: Structural insights (what the table of contents reveals, what’s emphasized, what’s omitted)
Chunking preserves Level 1. It destroys Level 2 and Level 3.
Example: A security policy document mentions “encryption” 47 times across 12 sections, but only discusses key rotation in an appendix. The frequency + placement tells you something. Isolated chunks don’t.
5. Single-Step Lookup is Shallow
Simple questions work fine:
- “What is the return policy?” → Find the return policy chunk → Done
Complex questions break:
- “How does the return policy interact with warranty claims for defective products purchased with a coupon?”
This needs:
- Return policy chunk
- Warranty terms chunk
- Promotion rules chunk
- Defect definition chunk
- Potentially case studies or examples
Single-step vector search finds the closest match. It doesn’t assemble multi-faceted answers.
6. Relationships Are Severed
Documents encode relationships:
- This policy overrides that guideline
- Section 4.2 refers back to Section 2.1
- The Q3 report updates the Q2 projections
- Method B is the recommended approach; Method A is deprecated
When you chunk, these connections vanish. Each chunk becomes an isolated fact. You lose the graph of how information relates.
7. Metadata is Inconsistent
Even when chunks preserve metadata (document ID, section, timestamp), they don’t capture:
- The chunk’s role in the larger argument
- Whether this is the authoritative statement or a counterexample
- If this content has been superseded
- The confidence level or source quality
You get chunks with tags. You don’t get chunks with meaning.
Why We’re Stuck With It
If chunking is so broken, why does everyone use it?
Because the alternatives are worse:
Embed entire documents: Doesn’t work beyond small corpora. A 50-page document embedded as one vector loses granularity.
Hierarchical summaries: Help with recall, but add latency and still rely on chunking at the leaf level.
Graph-based retrieval: Great for entity relationships, terrible for prose. Building accurate knowledge graphs requires structured data or massive overhead.
Semantic parsing: Parse documents into meaning units instead of token windows. Theoretically elegant, practically fragile. Every document type needs custom parsers.
Chunking is the least-bad option. It’s simple, it’s fast, it works “well enough” for basic retrieval.
What Actually Helps (Incrementally)
Since we’re stuck with chunking, here’s what reduces the pain:
Overlap Aggressively
50% overlap between chunks. Yes, it doubles storage. It also captures boundary context.
Store Chunk Neighbors
Metadata should include prev_chunk_id and next_chunk_id. When you retrieve chunk N, automatically pull N-1 and N+1.
Hybrid Retrieval
BM25/sparse + dense vectors. Lexical search finds exact terms; vectors find semantic similarity. Rerank with a cross-encoder.
Rich Metadata
Every chunk needs:
doc_id,section_id,chunk_indexdoc_type,source,timestamp,versionparent_summary(one-sentence doc summary)context_before,context_after(titles/headers from surrounding chunks)
Multi-Stage Retrieval
- First pass: retrieve 100 candidates (dense + sparse)
- Rerank: cross-encoder narrows to top 20
- Expansion: fetch adjacent chunks for top 5
- Assembly: dedupe, diversify, sequence by document order
Accept the Limits
Chunking works for:
- FAQ retrieval
- Policy lookups
- Code snippet search
- Simple question answering
Chunking fails for:
- Multi-step reasoning
- Comparative analysis
- Causal explanations
- Nuanced interpretation
Use RAG for the former. For the latter, you need agentic workflows, graph traversal, or long-context LLMs processing full documents.
The Real Problem
Chunking isn’t broken because it’s a bad technique. It’s broken because we’re asking it to do something it was never designed for: preserve meaning while destroying structure.
Vector embeddings represent semantic similarity in a continuous space. Chunks are discrete, arbitrary fragments. We’re trying to make discontinuous pieces behave like continuous meaning. It’s a category error.
The future isn’t better chunking. It’s moving past chunking entirely—toward models that can natively process long contexts, graph-augmented retrieval that preserves relationships, and semantic parsing that respects document structure.
Until then, we chunk. We add summaries, overlap, metadata, reranking, and adjacent retrieval. We make it work.
But let’s not pretend it’s not broken.
What are you seeing? If you’re building RAG systems, what chunking strategies actually work for you? Where does it fail? Let’s compare notes.