关于晚期交互动力学的研析笔记：聚焦晚期交互模型的目标行为分析

摘要

尽管延迟交互模型展现出强大的检索性能，但其底层动态机制仍有诸多未解之处，可能隐藏着性能瓶颈。本研究聚焦延迟交互检索中的两个核心问题：使用多向量评分时产生的长度偏差，以及经MaxSim算子池化后的最优分数之外的相似度分布特性。我们在NanoBEIR基准上对前沿模型展开分析，结果表明：因果延迟交互模型的理论长度偏差在实践中确实存在，而双向模型在极端情况下也会受其影响。同时研究发现，除文档标记的top-1相似度外，其余相似度未呈现显著趋势，这验证了MaxSim算子能有效挖掘标记级相似度评分的潜力。

English

While Late Interaction models exhibit strong retrieval performance, many of their underlying dynamics remain understudied, potentially hiding performance bottlenecks. In this work, we focus on two topics in Late Interaction retrieval: a length bias that arises when using multi-vector scoring, and the similarity distribution beyond the best scores pooled by the MaxSim operator. We analyze these behaviors for state-of-the-art models on the NanoBEIR benchmark. Results show that while the theoretical length bias of causal Late Interaction models holds in practice, bi-directional models can also suffer from it in extreme cases. We also note that no significant similarity trend lies beyond the top-1 document token, validating that the MaxSim operator efficiently exploits the token-level similarity scores.