AdaR1：从长链思维到混合思维的双层自适应推理优化

摘要

近期，长链推理模型在复杂推理任务中展现出强劲性能，但往往伴随显著的推理开销，使得效率成为关键问题。我们的实证分析表明，使用长链思维（Long-CoT）的效益因问题而异：某些问题需要精细推理，而另一些问题则未见提升，甚至准确率下降。这促使我们探索自适应推理策略，根据输入调整推理深度。然而，先前工作主要集中于减少长推理路径中的冗余，限制了在Long-CoT范式之外探索更高效策略的可能性。为此，我们提出了一种新颖的两阶段框架，旨在实现自适应且高效的推理。首先，我们通过融合长链与短链CoT模型构建混合推理模型，以支持多样化的推理风格。其次，我们采用双层偏好训练，指导模型在群体层面选择适宜的推理风格，并在每个风格组内倾向于简洁且正确的推理（实例层面）。实验证明，相较于其他基线方法，我们的方法在保持性能的同时显著降低了推理成本。特别是在五个数学数据集上，推理的平均长度减少了超过50%，凸显了自适应策略在优化大型语言模型推理效率方面的潜力。我们的代码即将发布于https://github.com/StarDewXXX/AdaR1。

English

Recently, long-thought reasoning models achieve strong performance on complex reasoning tasks, but often incur substantial inference overhead, making efficiency a critical concern. Our empirical analysis reveals that the benefit of using Long-CoT varies across problems: while some problems require elaborate reasoning, others show no improvement, or even degraded accuracy. This motivates adaptive reasoning strategies that tailor reasoning depth to the input. However, prior work primarily reduces redundancy within long reasoning paths, limiting exploration of more efficient strategies beyond the Long-CoT paradigm. To address this, we propose a novel two-stage framework for adaptive and efficient reasoning. First, we construct a hybrid reasoning model by merging long and short CoT models to enable diverse reasoning styles. Second, we apply bi-level preference training to guide the model to select suitable reasoning styles (group-level), and prefer concise and correct reasoning within each style group (instance-level). Experiments demonstrate that our method significantly reduces inference costs compared to other baseline approaches, while maintaining performance. Notably, on five mathematical datasets, the average length of reasoning is reduced by more than 50%, highlighting the potential of adaptive strategies to optimize reasoning efficiency in large language models. Our code is coming soon at https://github.com/StarDewXXX/AdaR1

AdaR1：从长链思维到混合思维的双层自适应推理优化

AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

摘要

Summary

Support

Support