VeriThinker：學習驗證使推理模型更高效

摘要

大型推理模型（LRMs）在利用思维链（CoT）进行复杂任务时表现出色。然而，其过度思考的倾向导致不必要的冗长推理链，显著增加了推理成本。为解决这一问题，我们引入了VeriThinker，一种新颖的CoT压缩方法。与直接在原始推理任务上使用合成简洁CoT数据微调LRMs的传统方法不同，我们创新性地仅通过辅助验证任务对模型进行微调。通过训练LRMs准确验证CoT解决方案的正确性，LRMs本质上对后续自我反思步骤的必要性变得更加敏锐，从而有效抑制了过度思考。大量实验验证了VeriThinker在保持甚至略微提高准确性的同时，显著减少了推理链长度。当应用于DeepSeek-R1-Distill-Qwen-7B时，我们的方法在MATH500上将推理标记从3790减少到2125，同时准确率提高了0.8%（从94.0%到94.8%）；在AIME25上，标记从14321减少到10287，准确率提升了2.1%（从38.7%到40.8%）。此外，我们的实验表明，VeriThinker也可以零样本泛化到推测推理。代码可在https://github.com/czg1225/VeriThinker获取。

English

Large Reasoning Models (LRMs) excel at complex tasks using Chain-of-Thought (CoT) reasoning. However, their tendency to overthinking leads to unnecessarily lengthy reasoning chains, dramatically increasing inference costs. To mitigate this issue, we introduce VeriThinker, a novel approach for CoT compression. Unlike conventional methods that fine-tune LRMs directly on the original reasoning task using synthetic concise CoT data, we innovatively fine-tune the model solely through an auxiliary verification task. By training LRMs to accurately verify the correctness of CoT solutions, the LRMs inherently become more discerning about the necessity of subsequent self-reflection steps, thereby effectively suppressing overthinking. Extensive experiments validate that VeriThinker substantially reduces reasoning chain lengths while maintaining or even slightly improving accuracy. When applied to DeepSeek-R1-Distill-Qwen-7B, our approach reduces reasoning tokens on MATH500 from 3790 to 2125 while improving accuracy by 0.8% (94.0% to 94.8%), and on AIME25, tokens decrease from 14321 to 10287 with a 2.1% accuracy gain (38.7% to 40.8%). Additionally, our experiments demonstrate that VeriThinker can also be zero-shot generalized to speculative reasoning. Code is available at https://github.com/czg1225/VeriThinker

VeriThinker：學習驗證使推理模型更高效

VeriThinker: Learning to Verify Makes Reasoning Model Efficient

摘要

Support