长度价值模型:面向令牌级长度建模的可扩展价值预训练
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling
April 29, 2026
作者: Zhen Zhang, Changyi Yang, Zijie Xia, Zhen Yang, Chengzhi Liu, Zhaotiao Weng, Yepeng Liu, Haobo Chen, Jin Pan, Chenyang Zhao, Yuheng Bu, Alkesh Patel, Zhe Gan, Xin Eric Wang
cs.AI
摘要
在现代自回归模型中,标记是计算的基本单位,生成长度直接影响推理成本和推理性能。尽管长度控制至关重要,现有方法主要停留在粗粒度的序列层面,缺乏细粒度的长度建模。我们提出长度价值模型(LenVM),这是一个对剩余生成长度进行建模的标记级框架。通过将长度建模转化为价值估计问题,并为每个生成的标记分配恒定负奖励,LenVM可预测一个有界的折现回报,该回报可作为剩余生成时长的单调代理指标。这种建模方式产生了无需标注、密集、无偏差且可扩展的监督信号。在LLM和VLM上的实验表明,LenVM在推理时能提供高效信号。在LIFEBench精确长度匹配任务中,将LenVM应用于7B模型可将长度得分从30.9提升至64.8,显著超越前沿闭源模型。此外,LenVM支持性能与效率之间的连续调控:在GSM8K任务中设定200个标记的预算时,LenVM能保持63%的准确率,而标记预算基线仅为6%。该模型还能从提示边界准确预测总生成长度。最后,LenVM的标记级价值为生成动态提供了可解释视角,揭示了特定标记如何将推理过程导向更短或更长的模式。实验结果表明,LenVM支持广泛的应用场景,且标记长度可有效建模为标记级价值信号,这凸显了LenVM作为通用长度建模框架的潜力,以及其作为支持未来强化学习训练的长度专用价值信号的可能性。代码已开源:https://github.com/eric-ai-lab/Length-Value-Model。
English
Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.