길이 가치 모델: 토큰 수준 길이 모델링을 위한 확장 가능한 가치 사전학습

초록

토큰은 현대 자기회귀 모델에서 계산의 기본 단위로 작동하며, 생성 길이는 추론 비용과 추론 성능에 직접적인 영향을 미칩니다. 이러한 중요성에도 불구하고, 기존 접근법은 주로 coarse-grained 시퀀스 수준에서 운영되며 세분화된 길이 모델링을 제공하지 못했습니다. 본 연구에서는 잔여 생성 길이를 모델링하는 토큰 수준 프레임워크인 Length Value Model (LenVM)을 소개합니다. 길이 모델링을 가치 추정 문제로 공식화하고 각 생성 토큰에 일정한 음의 보상을 할당함으로써, LenVM은 잔여 생성 지평에 대한 단조 프록시 역할을 하는 유계 할인 수익을 예측합니다. 이 공식화는 어노테이션이 필요 없고, 조밀하며, 편향되지 않고, 확장 가능한 지도 신호를 생성합니다. LLM과 VLM에 대한 실험 결과, LenVM이 추론 시간에 매우 효과적인 신호를 제공함을 입증했습니다. LIFEBench 정확한 길이 매칭 작업에서 7B 모델에 LenVM을 적용하면 길이 점수가 30.9에서 64.8로 향상되어 최첨단 클로즈드소스 모델들을 크게 능가했습니다. 더 나아가 LenVM은 성능과 효율성 간의 트레이드오프를 연속적으로 제어할 수 있게 합니다. GSM8K에서 200토큰 예산 조건 하에, LenVM은 63%의 정확도를 유지한 반면 토큰 예산 베이스라인은 6%에 그쳤습니다. 또한 LenVM은 프롬프트 경계에서 총 생성 길이를 정확하게 예측합니다. 마지막으로, LenVM의 토큰 수준 값은 생성 역학에 대한 해석 가능한 시각을 제공하며, 특정 토큰들이 추론을 더 짧거나 긴 체제로 어떻게 전환시키는지 보여줍니다. 결과는 LenVM이 광범위한 응용 분야를 지원하며 토큰 길이가 토큰 수준 가치 신호로 효과적으로 모델링될 수 있음을 입증하여, LenVM이 길이 모델링을 위한 일반 프레임워크로서 그리고 향후 RL 훈련을 지원할 수 있는 길이 특화 가치 신호로서의 잠재력을 부각시킵니다. 코드는 https://github.com/eric-ai-lab/Length-Value-Model에서 확인할 수 있습니다.

English

Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.

길이 가치 모델: 토큰 수준 길이 모델링을 위한 확장 가능한 가치 사전학습

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

초록

Support