New benchmarks comparing leading reranker models show ZeroEntropy's zerank-1 and zerank-2 models achieving top performance in relevance, while Jina Reranker v3 stands out for its balance of speed and accuracy. These evaluations provide critical insights for developers building advanced search and Retrieval-Augmented Generation (RAG) systems. The tests, conducted across various datasets and metrics, highlight the diverse strengths of modern rerankers in improving search result quality.
Overall Leaders Emerge
Recent analyses consistently place ZeroEntropy's reranker models at the forefront for overall relevance. In a November 2025 benchmark, Zerank-1 achieved the highest ELO score, indicating its superior ability to provide relevant results as judged by a large language model (LLM). This model demonstrated strong consistency across diverse data types, including finance, business, essays, web, facts, and science. ZeroEntropy's zerank-2, released in November 2025, also leads in instruction handling and calibrated scores, offering approximately 60ms latency for multilingual search.[agentset+2]
Voyage Rerank 2.5 closely followed ZeroEntropy in quality, offering comparable relevance but with about twice lower latency. This makes Voyage 2.5 a highly balanced choice for production environments where both speed and quality are crucial. Cohere Rerank 4, launched in December 2025, also emerged as a powerful enterprise-focused solution, boasting deep reasoning capabilities and a quadrupled context window compared to its predecessors. It supports over 100 languages and is designed for large-scale, personalized search in global organizations.[agentset+3]
Balancing Speed and Accuracy
The trade-off between speed and accuracy remains a key consideration in reranker selection. The Jina Reranker v3 showcased a compelling balance, achieving an 81.33% Hit@1 score with a latency of 188ms in a February 2026 benchmark. This performance makes Jina a strong contender for applications requiring sub-200ms total latency per query. Jina Reranker v2, updated in 2024/2025, is also noted for its speed-focused cross-encoder architecture, offering high throughput and optimization for code retrieval. It is reported to be 15 times faster than some competitors like bge-v2-m3.[research+3]
Another notable model, gte-reranker-modernbert-base, achieved an impressive 83.00% Hit@1 accuracy in the same benchmark. Remarkably, this 149-million-parameter model matched the performance of the much larger nemotron-rerank-1b, which has 1.2 billion parameters, for Hit@1 accuracy. While nemotron-rerank-1b showed slightly better performance on metrics like MRR@10 and Hit@10, the smaller gte-reranker-modernbert-base offers identical top-line accuracy for many applications where the first result is paramount. However, nemotron's latency was higher at 243ms compared to Jina's 188ms.[research+3]
Key Model Showdowns
Several other models demonstrated strong performance in specific areas. The Qwen3-Reranker-4B is highlighted as an optimal choice for balanced performance and cost-efficiency in 2026. Part of the Qwen3 series, this 4-billion-parameter model excels in understanding long texts with a 32k context length and supports over 100 languages. It is engineered to significantly improve search relevance by re-ordering documents based on a query.[siliconflow+4]
For open-source solutions, the bge-reranker-large v2 consistently ranked among the best for quality. Its smaller sibling, bge-reranker-base v2, offered about 90% of the quality at roughly half the cost, making it an efficient option. These models are strong choices for self-hosted deployments.[medium+2]
MonoT5-3B, a text-to-text reranker, also delivered top-tier quality but at a higher computational cost and latency. It is particularly effective for specialized content like legal or biomedical documents. In contrast, models like CTXL-Rerank v2 showed specialized strengths, performing well in science and facts but less consistently across other domains. The mxbai_rerank_xsmall model, with 70 million parameters, offered only a marginal 2 percentage point improvement over a baseline without reranking, suggesting it may lack the capacity for nuanced relevance judgments on longer texts.[medium+4]
The Role of Model Size and Open Source
A significant finding from recent benchmarks is that model size does not always dictate reranker quality. The gte-reranker-modernbert-base, at 149 million parameters, performed as well as the 1.2-billion-parameter nemotron-rerank-1b on Hit@1 accuracy. This suggests that smaller models can be highly effective for many production systems, potentially reducing computational overhead without sacrificing primary accuracy.[research+1]
The availability of robust open-source rerankers, such as the Jina Reranker v2 and bge-reranker models, provides developers with greater control and customization options. These open-source alternatives offer strong performance, enabling teams to deploy powerful reranking capabilities without relying solely on proprietary APIs. However, proprietary options like Cohere often provide slightly better accuracy and managed infrastructure, appealing to enterprises prioritizing ease of deployment and state-of-the-art performance.[reddit+5]
Future of Reranking Technology
Rerankers are becoming an indispensable component in modern search and RAG pipelines, significantly boosting retrieval quality by refining initial search results. Implementing a reranking stage can lead to substantial improvements, with the best models lifting Hit@1 scores by over 20 percentage points. This means a considerable increase in the number of queries that return the correct document as the top result.[research+4]
The continued development focuses on improving accuracy, reducing latency, and enhancing multilingual support. As large language models become more integrated into search systems, rerankers will play an even more critical role in ensuring the precision and relevance of information retrieved, ultimately enhancing user satisfaction across diverse applications.[siliconflow+2]



