VeriThinker:学习验证使推理模型更高效
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
May 23, 2025
作者: Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang
cs.AI
摘要
大型推理模型(LRMs)在运用思维链(CoT)推理处理复杂任务方面表现出色。然而,其过度思考的倾向导致推理链不必要地冗长,显著增加了推理成本。为解决这一问题,我们引入了VeriThinker,一种新颖的CoT压缩方法。与直接在原始推理任务上使用合成简洁CoT数据微调LRMs的传统方法不同,我们创新性地仅通过辅助验证任务对模型进行微调。通过训练LRMs准确验证CoT解决方案的正确性,模型自然会对后续自我反思步骤的必要性更加敏锐,从而有效抑制过度思考。大量实验验证,VeriThinker在保持甚至略微提升准确率的同时,显著缩短了推理链长度。应用于DeepSeek-R1-Distill-Qwen-7B时,我们的方法在MATH500上将推理标记从3790减少至2125,同时准确率提升0.8%(从94.0%至94.8%);在AIME25上,标记数从14321降至10287,准确率提升2.1%(从38.7%至40.8%)。此外,我们的实验表明,VeriThinker也能零样本泛化至推测性推理。代码已发布于https://github.com/czg1225/VeriThinker。
English
Large Reasoning Models (LRMs) excel at complex tasks using Chain-of-Thought
(CoT) reasoning. However, their tendency to overthinking leads to unnecessarily
lengthy reasoning chains, dramatically increasing inference costs. To mitigate
this issue, we introduce VeriThinker, a novel approach for CoT compression.
Unlike conventional methods that fine-tune LRMs directly on the original
reasoning task using synthetic concise CoT data, we innovatively fine-tune the
model solely through an auxiliary verification task. By training LRMs to
accurately verify the correctness of CoT solutions, the LRMs inherently become
more discerning about the necessity of subsequent self-reflection steps,
thereby effectively suppressing overthinking. Extensive experiments validate
that VeriThinker substantially reduces reasoning chain lengths while
maintaining or even slightly improving accuracy. When applied to
DeepSeek-R1-Distill-Qwen-7B, our approach reduces reasoning tokens on MATH500
from 3790 to 2125 while improving accuracy by 0.8% (94.0% to 94.8%), and on
AIME25, tokens decrease from 14321 to 10287 with a 2.1% accuracy gain (38.7% to
40.8%). Additionally, our experiments demonstrate that VeriThinker can also be
zero-shot generalized to speculative reasoning. Code is available at
https://github.com/czg1225/VeriThinkerSummary
AI-Generated Summary