AdaCtrl：通过难度感知预算实现自适应与可控推理

摘要

现代大型推理模型通过采用复杂的推理策略，展现出令人印象深刻的问题解决能力。然而，这些模型往往难以在效率与效果之间取得平衡，常常为简单问题生成不必要的冗长推理链。在本研究中，我们提出了AdaCtrl，一个新颖的框架，旨在支持难度感知的自适应推理预算分配，并允许用户显式控制推理深度。AdaCtrl根据自我评估的问题难度动态调整推理长度，同时允许用户手动控制预算，以优先考虑效率或效果。这一目标通过两阶段训练管道实现：首先是冷启动微调阶段，培养模型自我感知难度并调整推理预算的能力；随后是难度感知的强化学习（RL）阶段，该阶段在在线训练过程中根据模型能力的演进，优化其自适应推理策略并校准难度评估。为了便于用户直观交互，我们设计了显式的长度触发标签，作为预算控制的自然界面。实验结果表明，与同样包含微调和RL的标准训练基线相比，AdaCtrl根据估计的难度调整推理长度，在需要精细推理的更具挑战性的AIME2024和AIME2025数据集上，分别减少了10.06%和12.14%的响应长度，同时在MATH500和GSM8K数据集上，对于更简洁响应已足够的情况，分别减少了62.05%和91.04%的响应长度，并实现了性能提升。此外，AdaCtrl还支持用户精确控制推理预算，从而生成满足特定需求的定制化响应。

English

Modern large reasoning models demonstrate impressive problem-solving capabilities by employing sophisticated reasoning strategies. However, they often struggle to balance efficiency and effectiveness, frequently generating unnecessarily lengthy reasoning chains for simple problems. In this work, we propose AdaCtrl, a novel framework to support both difficulty-aware adaptive reasoning budget allocation and explicit user control over reasoning depth. AdaCtrl dynamically adjusts its reasoning length based on self-assessed problem difficulty, while also allowing users to manually control the budget to prioritize either efficiency or effectiveness. This is achieved through a two-stage training pipeline: an initial cold-start fine-tuning phase to instill the ability to self-aware difficulty and adjust reasoning budget, followed by a difficulty-aware reinforcement learning (RL) stage that refines the model's adaptive reasoning strategies and calibrates its difficulty assessments based on its evolving capabilities during online training. To enable intuitive user interaction, we design explicit length-triggered tags that function as a natural interface for budget control. Empirical results show that AdaCtrl adapts reasoning length based on estimated difficulty, compared to the standard training baseline that also incorporates fine-tuning and RL, it yields performance improvements and simultaneously reduces response length by 10.06% and 12.14% on the more challenging AIME2024 and AIME2025 datasets, which require elaborate reasoning, and by 62.05% and 91.04% on the MATH500 and GSM8K datasets, where more concise responses are sufficient. Furthermore, AdaCtrl enables precise user control over the reasoning budget, allowing for tailored responses to meet specific needs.

AdaCtrl：通过难度感知预算实现自适应与可控推理

AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting

摘要

Support