長さ価値モデル：トークンレベル長さモデリングのためのスケーラブルな価値事前学習

要旨

トークンは現代の自己回帰モデルにおける計算の基本単位であり、生成長は推論コストと推論性能の両方に直接影響を与えます。この重要性にもかかわらず、既存のアプローチは主に粗粒度なシーケンスレベルで動作し、細粒度な長さモデリングを欠いています。本研究では、残りの生成長をモデル化するトークンレベルのフレームワークであるLength Value Model（LenVM）を提案します。長さモデリングを価値推定問題として定式化し、各生成トークンに一定の負の報酬を割り当てることで、LenVMは残りの生成期間の単調代理指標となる有界で割引された収益を予測します。この定式化により、アノテーションが不要で、密で、偏りがなく、スケーラブルな教師信号が得られます。LLMとVLMにおける実験により、LenVMが推論時に極めて有効な信号を提供することが実証されました。LIFEBenchの正確な長さマッチングタスクでは、7BモデルにLenVMを適用することで、長さスコアが30.9から64.8に向上し、最先端のクローズドソースモデルを大幅に上回りました。さらに、LenVMは性能と効率性のトレードオフを連続的に制御することを可能にします。GSM8Kにおいて200トークンの予算では、LenVMは63%の精度を維持したのに対し、トークン予算ベースラインは6%でした。また、プロンプト境界から総生成長を正確に予測します。最後に、LenVMのトークンレベル価値は生成ダイナミクスの解釈可能な視点を提供し、特定のトークンが推論をより短いまたは長い体制にシフトさせる方法を明らかにします。結果は、LenVMが幅広い応用をサポートし、トークン長がトークンレベルの価値信号として効果的にモデル化できることを実証し、長さモデリングの一般フレームワークとして、および将来の強化学習トレーニングをサポートする可能性のある長さ特化の価値信号としてのLenVMの可能性を強調しています。コードはhttps://github.com/eric-ai-lab/Length-Value-Modelで公開されています。

English

Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.

長さ価値モデル：トークンレベル長さモデリングのためのスケーラブルな価値事前学習

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

要旨

Support