KaLM-Reranker-V1：面向压缩文档重排序的快速而非后期交互

摘要

随着检索系统的规模不断扩大，高质量的重排序变得愈发重要。然而，现有的大多数重排序器（无论是基于编码器还是基于解码器）都会将查询和段落进行联合编码，这导致计算紧密耦合，限制了部署效率和灵活性。我们提出KaLM-Reranker-V1，一种快速但非延迟交互（FBNL）的重排序器，它在解耦查询与段落计算的同时，保留了富有表现力的相关性建模能力。KaLM-Reranker-V1基于编码器-解码器架构构建，利用编码器通过套娃嵌入池化对段落进行预编码，同时解码器对系统指令、用户指令和查询意图进行建模；随后通过交叉注意力机制捕获查询上下文与段落表示之间的相关性。这种设计通过解耦的段落编码提升了KaLM-Reranker-V1的效率，同时借助交叉注意力保留了丰富的相关性建模，因此并非延迟交互。我们将KaLM-Reranker-V1实例化为三个规模——Nano、Small和Large，其激活参数量分别为0.27B、1B和4B。在BEIR、MIRACL和LMEB上的大量实验表明，KaLM-Reranker-V1以卓越的效率实现了强劲的重排序性能。在BEIR上，KaLM-Reranker-V1达到了与Qwen3-Reranker系列等强大工业级模型相当的最先进性能；在MIRACL上，尽管未经过大量多语言数据训练，KaLM-Reranker-V1仍展现出优异的重排序能力。此外，在LMEB上，重排序模型表现出明显优势，即使是0.27B的Nano模型也能与7-12B的嵌入模型相竞争。

English

As retrieval systems scale, high-quality reranking becomes increasingly important. However, most existing rerankers, whether encoder-based or decoder-based, jointly encode the query and passage, tightly coupling their computation and limiting deployment efficiency as well as flexibility. We present KaLM-Reranker-V1, a fast but not late-interaction (FBNL) reranker that decouples query and passage computation while retaining expressive relevance modeling. Built on an encoder-decoder architecture, KaLM-Reranker-V1 uses the encoder to pre-encode passages with Matryoshka embedding pooling, while the decoder models the system instruction, user instruction, and query intent; cross-attention then captures relevance between the query context and passage representations. This design makes KaLM-Reranker-V1 efficient through decoupled passage encoding, yet not late interaction, by preserving rich relevance modeling through cross-attention. We instantiate KaLM-Reranker-V1 in three sizes, Nano, Small, and Large, with 0.27B, 1B, and 4B activated parameters, respectively. Extensive experiments on BEIR, MIRACL, and LMEB demonstrate that KaLM-Reranker-V1 achieves strong reranking performance with superior efficiency. On BEIR, KaLM-Reranker-V1 achieves state-of-the-art performance, on par with strong industrial models such as the Qwen3-Reranker series; on MIRACL, despite not being extensively trained on multilingual data, KaLM-Reranker-V1 still shows excellent reranking performance. Moreover, on LMEB, reranking models demonstrate a clear advantage, with even the 0.27B Nano model remaining competitive with 7-12B embedding models.