EAGER:熵感知生成用於自適應推理時間縮放
EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling
October 13, 2025
作者: Daniel Scalena, Leonidas Zotos, Elisabetta Fersini, Malvina Nissim, Ahmet Üstün
cs.AI
摘要
随着推理语言模型及测试时扩展方法作为提升模型性能范式的兴起,通常需要大量计算资源从同一提示生成多个候选序列。这一做法虽能探索通往正确答案的不同推理路径,却为每个提示分配了相同的计算预算。基于不同提示承载着不同复杂程度因而具有不同计算需求的假设,我们提出了EAGer,一种无需训练即可通过基于词元熵分布利用模型不确定性的生成方法,旨在减少冗余计算并同时提升整体性能。EAGer仅在高熵词元出现时允许多推理路径的分支,并将节省的计算预算重新分配到最需要探索替代路径的实例上。我们发现,在诸如AIME 2025等复杂推理基准测试中,EAGer无需访问目标标签即可重新分配预算,在推理长度和Pass@k方面实现了最佳效率与性能的平衡。当目标标签可获取时,与全并行采样相比,EAGer生成的词元数量最多减少65%(从而节省计算资源),并在Pass@k上实现了高达37%的提升。
English
With the rise of reasoning language models and test-time scaling methods as a
paradigm for improving model performance, substantial computation is often
required to generate multiple candidate sequences from the same prompt. This
enables exploration of different reasoning paths toward the correct solution,
however, allocates the same compute budget for each prompt. Grounded on the
assumption that different prompts carry different degrees of complexity, and
thus different computation needs, we propose EAGer, a training-free generation
method that leverages model uncertainty through token-wise entropy distribution
to reduce redundant computation and concurrently improve overall performance.
EAGer allows branching to multiple reasoning paths only in the presence of
high-entropy tokens, and then reallocates the saved compute budget to the
instances where exploration of alternative paths is most needed. We find that
across multiple open-source models on complex reasoning benchmarks such as AIME
2025, EAGer can reallocate the budget without accessing target labels,
achieving the best efficiency-performance trade-off in terms of reasoning
length and Pass@k. When target labels are accessible, EAGer generates up to 65%
fewer tokens (hence saving compute) and achieves up to 37% improvement in
Pass@k compared to the Full Parallel Sampling.