AdaCtrl: 난이도 인식 예산 할당을 통한 적응형 및 제어 가능한 추론 방향

초록

현대의 대규모 추론 모델들은 정교한 추론 전략을 통해 인상적인 문제 해결 능력을 보여줍니다. 그러나 이러한 모델들은 종종 효율성과 효과성 사이의 균형을 맞추는 데 어려움을 겪으며, 간단한 문제에 대해 불필요하게 긴 추론 체인을 생성하는 경우가 많습니다. 본 연구에서는 AdaCtrl이라는 새로운 프레임워크를 제안합니다. 이 프레임워크는 난이도 인지 적응형 추론 예산 할당과 사용자가 추론 깊이를 명시적으로 제어할 수 있는 기능을 모두 지원합니다. AdaCtrl은 문제의 난이도를 스스로 평가하여 추론 길이를 동적으로 조정함과 동시에, 사용자가 예산을 수동으로 조절하여 효율성 또는 효과성을 우선시할 수 있도록 합니다. 이는 두 단계의 학습 파이프라인을 통해 구현됩니다: 첫 번째 단계는 문제 난이도를 스스로 인지하고 추론 예산을 조정하는 능력을 함양하기 위한 초기 콜드 스타트 미세 조정 단계이며, 두 번째 단계는 난이도 인지 강화 학습(RL) 단계로, 모델의 적응형 추론 전략을 개선하고 온라인 학습 중 변화하는 능력에 기반하여 난이도 평가를 보정합니다. 직관적인 사용자 상호작용을 가능하게 하기 위해, 예산 제어를 위한 자연스러운 인터페이스로 기능하는 명시적인 길이 트리거 태그를 설계했습니다. 실험 결과, AdaCtrl은 추정된 난이도에 따라 추론 길이를 적응적으로 조정하며, 미세 조정과 RL을 포함한 표준 학습 기준선과 비교했을 때, 성능 향상을 이루는 동시에 더 복잡한 추론이 필요한 AIME2024 및 AIME2025 데이터셋에서 각각 10.06%와 12.14%의 응답 길이를 줄였고, 더 간결한 응답이 충분한 MATH500 및 GSM8K 데이터셋에서는 각각 62.05%와 91.04%의 응답 길이를 줄였습니다. 또한 AdaCtrl은 사용자가 추론 예산을 정밀하게 제어할 수 있게 하여 특정 요구에 맞춘 응답을 제공할 수 있습니다.

English

Modern large reasoning models demonstrate impressive problem-solving capabilities by employing sophisticated reasoning strategies. However, they often struggle to balance efficiency and effectiveness, frequently generating unnecessarily lengthy reasoning chains for simple problems. In this work, we propose AdaCtrl, a novel framework to support both difficulty-aware adaptive reasoning budget allocation and explicit user control over reasoning depth. AdaCtrl dynamically adjusts its reasoning length based on self-assessed problem difficulty, while also allowing users to manually control the budget to prioritize either efficiency or effectiveness. This is achieved through a two-stage training pipeline: an initial cold-start fine-tuning phase to instill the ability to self-aware difficulty and adjust reasoning budget, followed by a difficulty-aware reinforcement learning (RL) stage that refines the model's adaptive reasoning strategies and calibrates its difficulty assessments based on its evolving capabilities during online training. To enable intuitive user interaction, we design explicit length-triggered tags that function as a natural interface for budget control. Empirical results show that AdaCtrl adapts reasoning length based on estimated difficulty, compared to the standard training baseline that also incorporates fine-tuning and RL, it yields performance improvements and simultaneously reduces response length by 10.06% and 12.14% on the more challenging AIME2024 and AIME2025 datasets, which require elaborate reasoning, and by 62.05% and 91.04% on the MATH500 and GSM8K datasets, where more concise responses are sufficient. Furthermore, AdaCtrl enables precise user control over the reasoning budget, allowing for tailored responses to meet specific needs.

AdaCtrl: 난이도 인식 예산 할당을 통한 적응형 및 제어 가능한 추론 방향

AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting

초록

Support