EAGER: 適応的推論時間スケーリングのためのエントロピー認識型生成

要旨

推論言語モデルとテスト時スケーリング手法の台頭に伴い、モデル性能を向上させるためのパラダイムとして、同じプロンプトから複数の候補シーケンスを生成するために多くの計算リソースが必要となることが多い。これにより、正しい解に向けた異なる推論経路の探索が可能となるが、各プロンプトに対して同じ計算予算が割り当てられる。異なるプロンプトは異なる複雑さを持ち、したがって異なる計算ニーズを持つという仮定に基づき、我々はEAGerを提案する。EAGerは、トークンごとのエントロピー分布を活用して冗長な計算を削減し、同時に全体的な性能を向上させるトレーニング不要の生成手法である。EAGerは、高エントロピーのトークンが存在する場合にのみ複数の推論経路に分岐し、節約された計算予算を代替経路の探索が最も必要なインスタンスに再割り当てする。AIME 2025などの複雑な推論ベンチマークにおいて、複数のオープンソースモデルに対してEAGerはターゲットラベルにアクセスすることなく予算を再割り当てし、推論長とPass@kの観点で最良の効率と性能のトレードオフを達成する。ターゲットラベルがアクセス可能な場合、EAGerはFull Parallel Samplingと比較して最大65%少ないトークンを生成し（したがって計算を節約）、Pass@kで最大37%の改善を達成する。

English

With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance. EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and then reallocates the saved compute budget to the instances where exploration of alternative paths is most needed. We find that across multiple open-source models on complex reasoning benchmarks such as AIME 2025, EAGer can reallocate the budget without accessing target labels, achieving the best efficiency-performance trade-off in terms of reasoning length and Pass@k. When target labels are accessible, EAGer generates up to 65% fewer tokens (hence saving compute) and achieves up to 37% improvement in Pass@k compared to the Full Parallel Sampling.

EAGER: 適応的推論時間スケーリングのためのエントロピー認識型生成

EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling

要旨

Support