ChatPaper.aiChatPaper

長度價值模型:面向詞元級長度建模的可擴展價值預訓練

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

April 29, 2026
作者: Zhen Zhang, Changyi Yang, Zijie Xia, Zhen Yang, Chengzhi Liu, Zhaotiao Weng, Yepeng Liu, Haobo Chen, Jin Pan, Chenyang Zhao, Yuheng Bu, Alkesh Patel, Zhe Gan, Xin Eric Wang
cs.AI

摘要

在現代自回歸模型中,標記(token)作為計算的基本單位,其生成長度直接影響推理成本和推理效能。儘管其重要性不言而喻,現有方法仍缺乏細粒度的長度建模能力,主要停留在粗粒度的序列層面操作。我們提出長度價值模型(LenVM),這是一種標記層級的框架,專門對剩餘生成長度進行建模。通過將長度建模形式化為價值估計問題,並為每個生成的標記分配固定負回報,LenVM預測出一個有界的折現回報,該回報可作為剩餘生成時長的單調代理指標。此建模方式產生的監督信號無需標註、密度高、無偏差且具擴展性。在大型語言模型和視覺語言模型上的實驗表明,LenVM在推理階段能提供高效信號。在LIFEBench的精確長度匹配任務中,將LenVM應用於70億參數模型後,長度得分從30.9提升至64.8,顯著超越前沿閉源模型。此外,LenVM能實現效能與效率權衡的連續控制:在GSM8K數據集上設定200個標記的預算時,LenVM保持63%的準確率,而標記預算基線僅為6%。該模型還能從提示邊界準確預測總生成長度。最後,LenVM的標記層級價值提供了生成動態的可解釋視角,揭示特定標記如何將推理過程導向更短或更長的模式。實驗結果證實LenVM支持廣泛應用場景,且標記長度可有效建模為標記層級的價值信號,彰顯其作為通用長度建模框架的潛力,以及作為可支持未來強化學習訓練的專用長度價值信號。程式碼已開源於:https://github.com/eric-ai-lab/Length-Value-Model。
English
Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.
PDF161May 2, 2026