ChatPaper.aiChatPaper

何时进行集成:识别稳定且快速的大语言模型集成中的关键标记点

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

October 17, 2025
作者: Heecheol Yun, Kwangmin Ki, Junghyun Lee, Eunho Yang
cs.AI

摘要

集成大型语言模型(LLMs)作为一种有前景的方法,通过利用各模型的互补优势来超越单一模型的性能,已引起广泛关注。特别是,通过聚合模型的下一个词元概率分布来选择下一个词元,在多种任务中已被证明是有效的。然而,尽管在短答案生成中取得了成功,其在长文本生成中的应用仍待深入探索。本文指出,在长文本生成中采用现有集成方法时,需谨慎选择集成位置,因为标准做法——即对每个词元进行集成——往往会导致性能下降。我们识别出决定这些位置的两个关键因素:模型间的词元化不匹配以及它们在下个词元概率分布上的共识。基于此,我们提出了SAFE(稳定且快速的大型语言模型集成框架),该框架通过综合考虑这些因素进行选择性集成。为进一步提升稳定性,我们引入了一种概率锐化策略,将分散在代表同一单词的多个子词词元上的概率整合到单一代表性词元中。我们在包括MATH500和BBH在内的多样化基准测试上的实验表明,SAFE在准确性和效率上均优于现有方法,即使仅集成不到1%的词元也能实现性能提升。
English
Ensembling Large Language Models (LLMs) has gained attention as a promising approach to surpass the performance of individual models by leveraging their complementary strengths. In particular, aggregating models' next-token probability distributions to select the next token has been shown to be effective in various tasks. However, while successful for short-form answers, its application to long-form generation remains underexplored. In this paper, we show that using existing ensemble methods in long-form generation requires a careful choice of ensembling positions, since the standard practice of ensembling at every token often degrades performance. We identify two key factors for determining these positions: tokenization mismatch across models and consensus in their next-token probability distributions. Based on this, we propose SAFE, (Stable And Fast LLM Ensembling), a framework that selectively ensembles by jointly considering these factors. To further improve stability, we introduce a probability sharpening strategy that consolidates probabilities spread across multiple sub-word tokens representing the same word into a single representative token. Our experiments on diverse benchmarks, including MATH500 and BBH, demonstrate that SAFE outperforms existing methods in both accuracy and efficiency, with gains achieved even when ensembling fewer than 1% of tokens.
PDF283October 21, 2025