VeriThinker:學習驗證使推理模型更高效
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
May 23, 2025
作者: Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang
cs.AI
摘要
大型推理模型(LRMs)在利用思维链(CoT)进行复杂任务时表现出色。然而,其过度思考的倾向导致不必要的冗长推理链,显著增加了推理成本。为解决这一问题,我们引入了VeriThinker,一种新颖的CoT压缩方法。与直接在原始推理任务上使用合成简洁CoT数据微调LRMs的传统方法不同,我们创新性地仅通过辅助验证任务对模型进行微调。通过训练LRMs准确验证CoT解决方案的正确性,LRMs本质上对后续自我反思步骤的必要性变得更加敏锐,从而有效抑制了过度思考。大量实验验证了VeriThinker在保持甚至略微提高准确性的同时,显著减少了推理链长度。当应用于DeepSeek-R1-Distill-Qwen-7B时,我们的方法在MATH500上将推理标记从3790减少到2125,同时准确率提高了0.8%(从94.0%到94.8%);在AIME25上,标记从14321减少到10287,准确率提升了2.1%(从38.7%到40.8%)。此外,我们的实验表明,VeriThinker也可以零样本泛化到推测推理。代码可在https://github.com/czg1225/VeriThinker获取。
English
Large Reasoning Models (LRMs) excel at complex tasks using Chain-of-Thought
(CoT) reasoning. However, their tendency to overthinking leads to unnecessarily
lengthy reasoning chains, dramatically increasing inference costs. To mitigate
this issue, we introduce VeriThinker, a novel approach for CoT compression.
Unlike conventional methods that fine-tune LRMs directly on the original
reasoning task using synthetic concise CoT data, we innovatively fine-tune the
model solely through an auxiliary verification task. By training LRMs to
accurately verify the correctness of CoT solutions, the LRMs inherently become
more discerning about the necessity of subsequent self-reflection steps,
thereby effectively suppressing overthinking. Extensive experiments validate
that VeriThinker substantially reduces reasoning chain lengths while
maintaining or even slightly improving accuracy. When applied to
DeepSeek-R1-Distill-Qwen-7B, our approach reduces reasoning tokens on MATH500
from 3790 to 2125 while improving accuracy by 0.8% (94.0% to 94.8%), and on
AIME25, tokens decrease from 14321 to 10287 with a 2.1% accuracy gain (38.7% to
40.8%). Additionally, our experiments demonstrate that VeriThinker can also be
zero-shot generalized to speculative reasoning. Code is available at
https://github.com/czg1225/VeriThinkerSummary
AI-Generated Summary