AdaptThink：推理模型能學會何時思考

摘要

近期，大型推理模型通过采用类人的深度思考方式，在多种任务上取得了令人瞩目的表现。然而，冗长的思考过程显著增加了推理开销，使得效率成为关键瓶颈。在本研究中，我们首先证明，对于相对简单的任务，NoThinking（即提示推理模型跳过思考直接生成最终解决方案）在性能和效率上都是更优的选择。受此启发，我们提出了AdaptThink，一种新颖的强化学习算法，旨在教导推理模型根据问题难度自适应选择最优的思考模式。具体而言，AdaptThink包含两个核心组件：(1) 一个约束优化目标，鼓励模型选择NoThinking同时保持整体性能；(2) 一种重要性采样策略，在策略训练期间平衡Thinking和NoThinking样本，从而实现冷启动，并让模型在整个训练过程中探索和利用两种思考模式。我们的实验表明，AdaptThink显著降低了推理成本，同时进一步提升了性能。值得注意的是，在三个数学数据集上，AdaptThink将DeepSeek-R1-Distill-Qwen-1.5B的平均响应长度减少了53%，并使其准确率提高了2.4%，凸显了自适应思考模式选择在优化推理质量与效率平衡方面的潜力。我们的代码和模型可在https://github.com/THU-KEG/AdaptThink获取。

English

Recently, large reasoning models have achieved impressive performance on various tasks by employing human-like deep thinking. However, the lengthy thinking process substantially increases inference overhead, making efficiency a critical bottleneck. In this work, we first demonstrate that NoThinking, which prompts the reasoning model to skip thinking and directly generate the final solution, is a better choice for relatively simple tasks in terms of both performance and efficiency. Motivated by this, we propose AdaptThink, a novel RL algorithm to teach reasoning models to choose the optimal thinking mode adaptively based on problem difficulty. Specifically, AdaptThink features two core components: (1) a constrained optimization objective that encourages the model to choose NoThinking while maintaining the overall performance; (2) an importance sampling strategy that balances Thinking and NoThinking samples during on-policy training, thereby enabling cold start and allowing the model to explore and exploit both thinking modes throughout the training process. Our experiments indicate that AdaptThink significantly reduces the inference costs while further enhancing performance. Notably, on three math datasets, AdaptThink reduces the average response length of DeepSeek-R1-Distill-Qwen-1.5B by 53% and improves its accuracy by 2.4%, highlighting the promise of adaptive thinking-mode selection for optimizing the balance between reasoning quality and efficiency. Our codes and models are available at https://github.com/THU-KEG/AdaptThink.

AdaptThink：推理模型能學會何時思考

AdaptThink: Reasoning Models Can Learn When to Think

摘要

Support