Modern information retrieval has evolved far beyond simple keyword search. With the rise of reasoning-heavy queries, traditional retrieval-augmented generation (RAG) pipelines struggle to deliver relevant results. Many queries now require multi-step reasoning, inference, interpretation of indirect references and the combination of information from multiple documents. To address this growing challenge, the DIVER framework was developed as a multi-stage retrieval system that integrates deep reasoning into the retrieval process.

DIVER is designed to handle complex, real-world queries that demand more than surface-level keyword matching. It introduces a structured pipeline combining LLM-driven query expansion, reasoning-enhanced retrieval, and an advanced reranking approach. As a result, it achieves state-of-the-art performance on the BRIGHT benchmark, exceeding many existing reasoning-aware retrieval models.
This blog provides an overview of DIVER, explains its architecture, highlights its key features, and explores why it represents a major advancement in retrieval for reasoning-intensive tasks.
What Is DIVER?
It is a multi-stage retrieval framework designed specifically for reasoning-intensive information retrieval. Instead of relying solely on similarity metrics, DIVER incorporates deliberate reasoning steps at multiple stages of the retrieval process.
The framework consists of four core components:
- Document pre-processing
- LLM-driven iterative query expansion
- A reasoning-enhanced retriever
- A reranker that merges retrieval scores with LLM-generated helpfulness scores
Together, these stages enable DIVER to capture deeper semantic relationships and produce far more accurate retrieval results, especially on complex datasets.
Why Traditional RAG Systems Fail on Complex Queries
Standard RAG systems typically use a simple pipeline:
- Encode the query
- Retrieve documents based on embedding similarity
- Pass retrieved context to an LLM for generation
While effective for fact-based queries, this approach falls short when queries require:
- Abstract reasoning
- Multi-step inference
- Combining multiple concepts
- Understanding indirect relationships
- Domain-specific logic
Benchmark data from the BRIGHT dataset demonstrates that many traditional models plateau at low performance levels when reasoning complexity increases. DIVER addresses this by weaving reasoning deeply into every stage of the retrieval process.
The DIVER Architecture: A Four-Stage Pipeline
1. LLM-Driven Query Expansion
DIVER begins by improving the input query through iterative reasoning. The system uses a large language model to expand the original query into a richer, more detailed query representation.
This step helps the retriever understand:
- Multi-step reasoning
- Hidden context
- Implicit references
- Indirect connections
Unlike traditional query expansion, which focuses on adding synonyms or related words, DIVER’s expansion process generates structured reasoning paths to guide retrieval.
2. Reasoning-Enhanced Retriever
The retriever in DIVER is not a standard embedding model. It is a specialized retriever fine-tuned on synthetically generated complex queries. These training samples mimic real-world reasoning patterns and help the model understand intricate relationships within the data.
Example improvements include:
- Higher sensitivity to logical structure
- Better alignment with expanded queries
- Stronger performance on domain-specific reasoning tasks
The DIVER-Retriever models come in multiple sizes, including:
- 0.6B parameters
- 1.7B parameters
- 4B parameters
- 4B-1020 enhanced version
These models achieve strong BRIGHT benchmark scores, outperforming many larger general models.
3. Merged Reranker for Final Output
The reranking stage is where DIVER truly differentiates itself.
Instead of ranking documents only by similarity, it integrates two independent signals:
- Traditional retrieval scores
- LLM-based helpfulness scores
The merged reranker evaluates whether a document is not only similar to the query but also helpful for reasoning and answering the question.
This ensures that multi-step reasoning tasks are supported by the most informative documents, reducing noise and improving accuracy.
4. Document Pre-Processing
Before retrieval begins, DIVER preprocesses the document corpus to optimize chunking, structure, and relevance. This ensures the system retrieves the most contextually rich sections of documents for reasoning-heavy queries.
Benchmark Performance on BRIGHT
The BRIGHT benchmark evaluates retrieval systems across several reasoning-intensive domains, including biology, psychology, robotics, sustainability, economics, theoretical questions, and more.
DIVER consistently outperforms competitive models:
- DIVER scores 41.6 average NDCG
- DIVER V2 reaches 45.8, setting a new state-of-the-art
- The DIVER-Retriever-4B-1020 model achieves strong results even without reranking
These gains demonstrate that injecting reasoning directly into retrieval significantly improves performance.
Key Features of DIVER
Advanced Multi-Stage Reasoning
DIVER’s pipeline integrates reasoning at every stage, improving retrieval quality for complex queries.
Iterative Query Expansion
LLM-driven expansions refine the query to capture deeper meaning and multiple reasoning paths.
Fine-Tuned Retriever Models
Trained on synthetic reasoning data, the retriever understands abstract and multi-step relationships.
Merged Reranker
Combines traditional search signals with LLM judgment for superior ranking.
Open-Source and Extensible
The full codebase, inference pipeline, rerankers, and retriever models are openly available.
Applications of DIVER
DIVER is designed for any domain where queries require deeper reasoning:
- Medical information retrieval
- Scientific research assistance
- Educational problem-solving
- Legal and financial analysis
- Complex question-answering systems
- Agentic RAG frameworks
- Enterprise knowledge search
Any system that must retrieve information with accuracy, context, and reasoning can benefit from DIVER’s architecture.
Conclusion
It represents a major breakthrough in reasoning-aware retrieval. By combining LLM-driven query expansion, a specialized retriever trained for complex tasks, and a merged reranker that evaluates helpfulness, it sets a new performance standard for reasoning-intensive search. Its success on the BRIGHT benchmark underscores the importance of integrating deeper reasoning into retrieval pipelines.
As real-world queries grow more complex, systems built like DIVER will shape the next generation of intelligent retrieval and agentic RAG frameworks. Whether used for scientific research, educational platforms, healthcare, or enterprise knowledge systems, it offers a powerful foundation for delivering accurate, reasoned, and contextually rich retrieval.
Follow us for cutting-edge updates in AI & explore the world of LLMs, deep learning, NLP and AI agents with us.
Related Reads
- Microsoft UFO³: The Future of Multi-Device AI Agent Automation
- A Complete Guide to the Made With ML Repository: Designing and Deploying Production-Grade Machine Learning Systems
- A Complete Guide to the GenAI Agents Repository: Building Generative AI Agents from Beginner to Advanced
- Microsoft’s AI Agents for Beginners: A Complete Guide to Building Your First AI Agent
- The Ultimate Guide to Prompt Engineering: A Deep Dive into DAIR.AI’s Leading Resource
2 thoughts on “DIVER: A Multi-Stage Retrieval System for Complex, Reasoning-Intensive Search”