DIVER: A Multi-Stage Retrieval System for Complex, Reasoning-Intensive Search

Modern information retrieval has evolved far beyond simple keyword search. With the rise of reasoning-heavy queries, traditional retrieval-augmented generation (RAG) pipelines struggle to deliver relevant results. Many queries now require multi-step reasoning, inference, interpretation of indirect references and the combination of information from multiple documents. To address this growing challenge, the DIVER framework was developed as a multi-stage retrieval system that integrates deep reasoning into the retrieval process.

DIVER: A Multi-Stage Retrieval System for Complex, Reasoning-Intensive Search

DIVER is designed to handle complex, real-world queries that demand more than surface-level keyword matching. It introduces a structured pipeline combining LLM-driven query expansion, reasoning-enhanced retrieval, and an advanced reranking approach. As a result, it achieves state-of-the-art performance on the BRIGHT benchmark, exceeding many existing reasoning-aware retrieval models.

This blog provides an overview of DIVER, explains its architecture, highlights its key features, and explores why it represents a major advancement in retrieval for reasoning-intensive tasks.

What Is DIVER?

It is a multi-stage retrieval framework designed specifically for reasoning-intensive information retrieval. Instead of relying solely on similarity metrics, DIVER incorporates deliberate reasoning steps at multiple stages of the retrieval process.

The framework consists of four core components:

  1. Document pre-processing
  2. LLM-driven iterative query expansion
  3. A reasoning-enhanced retriever
  4. A reranker that merges retrieval scores with LLM-generated helpfulness scores

Together, these stages enable DIVER to capture deeper semantic relationships and produce far more accurate retrieval results, especially on complex datasets.

Why Traditional RAG Systems Fail on Complex Queries

Standard RAG systems typically use a simple pipeline:

  • Encode the query
  • Retrieve documents based on embedding similarity
  • Pass retrieved context to an LLM for generation

While effective for fact-based queries, this approach falls short when queries require:

  • Abstract reasoning
  • Multi-step inference
  • Combining multiple concepts
  • Understanding indirect relationships
  • Domain-specific logic

Benchmark data from the BRIGHT dataset demonstrates that many traditional models plateau at low performance levels when reasoning complexity increases. DIVER addresses this by weaving reasoning deeply into every stage of the retrieval process.

The DIVER Architecture: A Four-Stage Pipeline

1. LLM-Driven Query Expansion

DIVER begins by improving the input query through iterative reasoning. The system uses a large language model to expand the original query into a richer, more detailed query representation.

This step helps the retriever understand:

  • Multi-step reasoning
  • Hidden context
  • Implicit references
  • Indirect connections

Unlike traditional query expansion, which focuses on adding synonyms or related words, DIVER’s expansion process generates structured reasoning paths to guide retrieval.

2. Reasoning-Enhanced Retriever

The retriever in DIVER is not a standard embedding model. It is a specialized retriever fine-tuned on synthetically generated complex queries. These training samples mimic real-world reasoning patterns and help the model understand intricate relationships within the data.

Example improvements include:

  • Higher sensitivity to logical structure
  • Better alignment with expanded queries
  • Stronger performance on domain-specific reasoning tasks

The DIVER-Retriever models come in multiple sizes, including:

  • 0.6B parameters
  • 1.7B parameters
  • 4B parameters
  • 4B-1020 enhanced version

These models achieve strong BRIGHT benchmark scores, outperforming many larger general models.

3. Merged Reranker for Final Output

The reranking stage is where DIVER truly differentiates itself.
Instead of ranking documents only by similarity, it integrates two independent signals:

  • Traditional retrieval scores
  • LLM-based helpfulness scores

The merged reranker evaluates whether a document is not only similar to the query but also helpful for reasoning and answering the question.

This ensures that multi-step reasoning tasks are supported by the most informative documents, reducing noise and improving accuracy.

4. Document Pre-Processing

Before retrieval begins, DIVER preprocesses the document corpus to optimize chunking, structure, and relevance. This ensures the system retrieves the most contextually rich sections of documents for reasoning-heavy queries.

Benchmark Performance on BRIGHT

The BRIGHT benchmark evaluates retrieval systems across several reasoning-intensive domains, including biology, psychology, robotics, sustainability, economics, theoretical questions, and more.

DIVER consistently outperforms competitive models:

  • DIVER scores 41.6 average NDCG
  • DIVER V2 reaches 45.8, setting a new state-of-the-art
  • The DIVER-Retriever-4B-1020 model achieves strong results even without reranking

These gains demonstrate that injecting reasoning directly into retrieval significantly improves performance.

Key Features of DIVER

Advanced Multi-Stage Reasoning

DIVER’s pipeline integrates reasoning at every stage, improving retrieval quality for complex queries.

Iterative Query Expansion

LLM-driven expansions refine the query to capture deeper meaning and multiple reasoning paths.

Fine-Tuned Retriever Models

Trained on synthetic reasoning data, the retriever understands abstract and multi-step relationships.

Merged Reranker

Combines traditional search signals with LLM judgment for superior ranking.

Open-Source and Extensible

The full codebase, inference pipeline, rerankers, and retriever models are openly available.

Applications of DIVER

DIVER is designed for any domain where queries require deeper reasoning:

  • Medical information retrieval
  • Scientific research assistance
  • Educational problem-solving
  • Legal and financial analysis
  • Complex question-answering systems
  • Agentic RAG frameworks
  • Enterprise knowledge search

Any system that must retrieve information with accuracy, context, and reasoning can benefit from DIVER’s architecture.

Conclusion

It represents a major breakthrough in reasoning-aware retrieval. By combining LLM-driven query expansion, a specialized retriever trained for complex tasks, and a merged reranker that evaluates helpfulness, it sets a new performance standard for reasoning-intensive search. Its success on the BRIGHT benchmark underscores the importance of integrating deeper reasoning into retrieval pipelines.

As real-world queries grow more complex, systems built like DIVER will shape the next generation of intelligent retrieval and agentic RAG frameworks. Whether used for scientific research, educational platforms, healthcare, or enterprise knowledge systems, it offers a powerful foundation for delivering accurate, reasoned, and contextually rich retrieval.

Follow us for cutting-edge updates in AI & explore the world of LLMs, deep learning, NLP and AI agents with us.

Related Reads

References

GitHub Link

2 thoughts on “DIVER: A Multi-Stage Retrieval System for Complex, Reasoning-Intensive Search”

Leave a Comment