AdaptThink：推理模型可学会何时思考

摘要

近期，大型推理模型通过采用类人的深度思考方式，在各种任务上取得了令人瞩目的表现。然而，冗长的思考过程显著增加了推理开销，使得效率成为关键瓶颈。在本研究中，我们首先证明，对于相对简单的任务，NoThinking（即提示推理模型跳过思考直接生成最终解决方案）在性能和效率上都是更优的选择。受此启发，我们提出了AdaptThink，一种新颖的强化学习算法，旨在教导推理模型根据问题难度自适应选择最优的思考模式。具体而言，AdaptThink包含两大核心组件：(1) 一个约束优化目标，鼓励模型在保持整体性能的同时选择NoThinking；(2) 一种重要性采样策略，在策略训练过程中平衡Thinking与NoThinking样本，从而实现冷启动，并让模型在训练过程中探索和利用两种思考模式。实验结果表明，AdaptThink显著降低了推理成本，同时进一步提升了性能。值得注意的是，在三个数学数据集上，AdaptThink将DeepSeek-R1-Distill-Qwen-1.5B的平均响应长度减少了53%，并使其准确率提高了2.4%，这凸显了自适应思考模式选择在优化推理质量与效率平衡方面的潜力。我们的代码和模型已发布于https://github.com/THU-KEG/AdaptThink。

English

Recently, large reasoning models have achieved impressive performance on various tasks by employing human-like deep thinking. However, the lengthy thinking process substantially increases inference overhead, making efficiency a critical bottleneck. In this work, we first demonstrate that NoThinking, which prompts the reasoning model to skip thinking and directly generate the final solution, is a better choice for relatively simple tasks in terms of both performance and efficiency. Motivated by this, we propose AdaptThink, a novel RL algorithm to teach reasoning models to choose the optimal thinking mode adaptively based on problem difficulty. Specifically, AdaptThink features two core components: (1) a constrained optimization objective that encourages the model to choose NoThinking while maintaining the overall performance; (2) an importance sampling strategy that balances Thinking and NoThinking samples during on-policy training, thereby enabling cold start and allowing the model to explore and exploit both thinking modes throughout the training process. Our experiments indicate that AdaptThink significantly reduces the inference costs while further enhancing performance. Notably, on three math datasets, AdaptThink reduces the average response length of DeepSeek-R1-Distill-Qwen-1.5B by 53% and improves its accuracy by 2.4%, highlighting the promise of adaptive thinking-mode selection for optimizing the balance between reasoning quality and efficiency. Our codes and models are available at https://github.com/THU-KEG/AdaptThink.

AdaptThink：推理模型可学会何时思考

AdaptThink: Reasoning Models Can Learn When to Think

摘要

Support