更短不減效:以簡易樣本作為數學RLVR中的長度正則化器實現節約推理
Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
November 2, 2025
作者: Abdelaziz Bounhar, Hadi Abdine, Evan Dufraisse, Ahmad Chamma, Amr Mohamed, Dani Bouch, Michalis Vazirgiannis, Guokan Shang
cs.AI
摘要
專為逐步推理訓練的大型語言模型(LLMs)常會產生過於冗長的輸出,導致推理成本增加。傳統的「可驗證獎勵強化學習」(RLVR)流程會過濾掉「簡單」問題以提升訓練效率,使模型主要針對需要更長推理鏈的難題進行訓練。這種做法會使輸出長度分佈向上偏移,導致模型將「思考更久」與「思考更好」混為一談。本研究證明,保留並適度加權中等難度問題可作為隱性的長度正則化器。讓模型接觸可解決的短鏈任務能約束其輸出分佈,防止冗長度失控。由此實現了**免費的湧現簡潔性**:即便沒有明確的長度懲罰機制,模型仍能學會在解決難題時不膨脹輸出長度。基於此方法在Qwen3-4B-Thinking-2507模型(16k token限制)上進行的RLVR實驗顯示,在保持基準pass@1 AIME25準確率的同時,生成解題方案的長度平均縮短近一半。程式碼已開源於https://github.com/MBZUAI-Paris/Frugal-AI{GitHub},資料集與模型發佈於https://huggingface.co/collections/MBZUAI-Paris/k2-think-mini-68dcfa8b114686a4bd3dc2bc{Hugging Face}。
English
Large language models (LLMs) trained for step-by-step reasoning often become
excessively verbose, raising inference cost. Standard Reinforcement Learning
with Verifiable Rewards (RLVR) pipelines filter out ``easy'' problems for
training efficiency, leaving the model to train primarily on harder problems
that require longer reasoning chains. This skews the output length distribution
upward, resulting in a model that conflates ``thinking longer'' with
``thinking better''. In this work, we show that retaining and modestly
up-weighting moderately easy problems acts as an implicit length regularizer.
Exposing the model to solvable short-chain tasks constrains its output
distribution and prevents runaway verbosity. The result is
\emph{emergent brevity for free}: the model learns to solve harder
problems without inflating the output length, despite the absence of
any explicit length penalization. RLVR experiments using this approach on
Qwen3-4B-Thinking-2507 (with a 16k token limit) achieve baseline
pass@1 AIME25 accuracy while generating solutions that are, on average, nearly
twice as short. The code is available at
https://github.com/MBZUAI-Paris/Frugal-AI{GitHub}, with datasets and
models on
https://huggingface.co/collections/MBZUAI-Paris/k2-think-mini-68dcfa8b114686a4bd3dc2bc{Hugging
Face}.