VeriThinker：学习验证使推理模型更高效

摘要

大型推理模型（LRMs）在运用思维链（CoT）推理处理复杂任务方面表现出色。然而，其过度思考的倾向导致推理链不必要地冗长，显著增加了推理成本。为解决这一问题，我们引入了VeriThinker，一种新颖的CoT压缩方法。与直接在原始推理任务上使用合成简洁CoT数据微调LRMs的传统方法不同，我们创新性地仅通过辅助验证任务对模型进行微调。通过训练LRMs准确验证CoT解决方案的正确性，模型自然会对后续自我反思步骤的必要性更加敏锐，从而有效抑制过度思考。大量实验验证，VeriThinker在保持甚至略微提升准确率的同时，显著缩短了推理链长度。应用于DeepSeek-R1-Distill-Qwen-7B时，我们的方法在MATH500上将推理标记从3790减少至2125，同时准确率提升0.8%（从94.0%至94.8%）；在AIME25上，标记数从14321降至10287，准确率提升2.1%（从38.7%至40.8%）。此外，我们的实验表明，VeriThinker也能零样本泛化至推测性推理。代码已发布于https://github.com/czg1225/VeriThinker。

English

Large Reasoning Models (LRMs) excel at complex tasks using Chain-of-Thought (CoT) reasoning. However, their tendency to overthinking leads to unnecessarily lengthy reasoning chains, dramatically increasing inference costs. To mitigate this issue, we introduce VeriThinker, a novel approach for CoT compression. Unlike conventional methods that fine-tune LRMs directly on the original reasoning task using synthetic concise CoT data, we innovatively fine-tune the model solely through an auxiliary verification task. By training LRMs to accurately verify the correctness of CoT solutions, the LRMs inherently become more discerning about the necessity of subsequent self-reflection steps, thereby effectively suppressing overthinking. Extensive experiments validate that VeriThinker substantially reduces reasoning chain lengths while maintaining or even slightly improving accuracy. When applied to DeepSeek-R1-Distill-Qwen-7B, our approach reduces reasoning tokens on MATH500 from 3790 to 2125 while improving accuracy by 0.8% (94.0% to 94.8%), and on AIME25, tokens decrease from 14321 to 10287 with a 2.1% accuracy gain (38.7% to 40.8%). Additionally, our experiments demonstrate that VeriThinker can also be zero-shot generalized to speculative reasoning. Code is available at https://github.com/czg1225/VeriThinker

VeriThinker：学习验证使推理模型更高效

VeriThinker: Learning to Verify Makes Reasoning Model Efficient

摘要

Support