ChatPaper.aiChatPaper

KaLM-Reranker-V1:面向压缩文档重排序的快速而非后期交互

KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking

June 22, 2026
作者: Xinping Zhao, Jiaxin Xu, Ziqi Dai, Xin Zhang, Shouzheng Huang, Danyu Tang, Xinshuo Hu, Meishan Zhang, Baotian Hu, Min Zhang
cs.AI

摘要

随着检索系统的规模不断扩大,高质量的重排序变得愈发重要。然而,现有的大多数重排序器(无论是基于编码器还是基于解码器)都会将查询和段落进行联合编码,这导致计算紧密耦合,限制了部署效率和灵活性。我们提出KaLM-Reranker-V1,一种快速但非延迟交互(FBNL)的重排序器,它在解耦查询与段落计算的同时,保留了富有表现力的相关性建模能力。KaLM-Reranker-V1基于编码器-解码器架构构建,利用编码器通过套娃嵌入池化对段落进行预编码,同时解码器对系统指令、用户指令和查询意图进行建模;随后通过交叉注意力机制捕获查询上下文与段落表示之间的相关性。这种设计通过解耦的段落编码提升了KaLM-Reranker-V1的效率,同时借助交叉注意力保留了丰富的相关性建模,因此并非延迟交互。我们将KaLM-Reranker-V1实例化为三个规模——Nano、Small和Large,其激活参数量分别为0.27B、1B和4B。在BEIR、MIRACL和LMEB上的大量实验表明,KaLM-Reranker-V1以卓越的效率实现了强劲的重排序性能。在BEIR上,KaLM-Reranker-V1达到了与Qwen3-Reranker系列等强大工业级模型相当的最先进性能;在MIRACL上,尽管未经过大量多语言数据训练,KaLM-Reranker-V1仍展现出优异的重排序能力。此外,在LMEB上,重排序模型表现出明显优势,即使是0.27B的Nano模型也能与7-12B的嵌入模型相竞争。
English
As retrieval systems scale, high-quality reranking becomes increasingly important. However, most existing rerankers, whether encoder-based or decoder-based, jointly encode the query and passage, tightly coupling their computation and limiting deployment efficiency as well as flexibility. We present KaLM-Reranker-V1, a fast but not late-interaction (FBNL) reranker that decouples query and passage computation while retaining expressive relevance modeling. Built on an encoder-decoder architecture, KaLM-Reranker-V1 uses the encoder to pre-encode passages with Matryoshka embedding pooling, while the decoder models the system instruction, user instruction, and query intent; cross-attention then captures relevance between the query context and passage representations. This design makes KaLM-Reranker-V1 efficient through decoupled passage encoding, yet not late interaction, by preserving rich relevance modeling through cross-attention. We instantiate KaLM-Reranker-V1 in three sizes, Nano, Small, and Large, with 0.27B, 1B, and 4B activated parameters, respectively. Extensive experiments on BEIR, MIRACL, and LMEB demonstrate that KaLM-Reranker-V1 achieves strong reranking performance with superior efficiency. On BEIR, KaLM-Reranker-V1 achieves state-of-the-art performance, on par with strong industrial models such as the Qwen3-Reranker series; on MIRACL, despite not being extensively trained on multilingual data, KaLM-Reranker-V1 still shows excellent reranking performance. Moreover, on LMEB, reranking models demonstrate a clear advantage, with even the 0.27B Nano model remaining competitive with 7-12B embedding models.