AdaptThink:推理模型能學會何時思考
AdaptThink: Reasoning Models Can Learn When to Think
May 19, 2025
作者: Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li
cs.AI
摘要
近期,大型推理模型通过采用类人的深度思考方式,在多种任务上取得了令人瞩目的表现。然而,冗长的思考过程显著增加了推理开销,使得效率成为关键瓶颈。在本研究中,我们首先证明,对于相对简单的任务,NoThinking(即提示推理模型跳过思考直接生成最终解决方案)在性能和效率上都是更优的选择。受此启发,我们提出了AdaptThink,一种新颖的强化学习算法,旨在教导推理模型根据问题难度自适应选择最优的思考模式。具体而言,AdaptThink包含两个核心组件:(1) 一个约束优化目标,鼓励模型选择NoThinking同时保持整体性能;(2) 一种重要性采样策略,在策略训练期间平衡Thinking和NoThinking样本,从而实现冷启动,并让模型在整个训练过程中探索和利用两种思考模式。我们的实验表明,AdaptThink显著降低了推理成本,同时进一步提升了性能。值得注意的是,在三个数学数据集上,AdaptThink将DeepSeek-R1-Distill-Qwen-1.5B的平均响应长度减少了53%,并使其准确率提高了2.4%,凸显了自适应思考模式选择在优化推理质量与效率平衡方面的潜力。我们的代码和模型可在https://github.com/THU-KEG/AdaptThink获取。
English
Recently, large reasoning models have achieved impressive performance on
various tasks by employing human-like deep thinking. However, the lengthy
thinking process substantially increases inference overhead, making efficiency
a critical bottleneck. In this work, we first demonstrate that NoThinking,
which prompts the reasoning model to skip thinking and directly generate the
final solution, is a better choice for relatively simple tasks in terms of both
performance and efficiency. Motivated by this, we propose AdaptThink, a novel
RL algorithm to teach reasoning models to choose the optimal thinking mode
adaptively based on problem difficulty. Specifically, AdaptThink features two
core components: (1) a constrained optimization objective that encourages the
model to choose NoThinking while maintaining the overall performance; (2) an
importance sampling strategy that balances Thinking and NoThinking samples
during on-policy training, thereby enabling cold start and allowing the model
to explore and exploit both thinking modes throughout the training process. Our
experiments indicate that AdaptThink significantly reduces the inference costs
while further enhancing performance. Notably, on three math datasets,
AdaptThink reduces the average response length of DeepSeek-R1-Distill-Qwen-1.5B
by 53% and improves its accuracy by 2.4%, highlighting the promise of adaptive
thinking-mode selection for optimizing the balance between reasoning quality
and efficiency. Our codes and models are available at
https://github.com/THU-KEG/AdaptThink.Summary
AI-Generated Summary