탄력적 추론을 통한 확장 가능한 사고 연쇄

초록

대규모 추론 모델(LRMs)은 확장된 사고 사슬(CoT)을 생성함으로써 복잡한 과제에서 놀라운 진전을 이루었습니다. 그러나 이들의 제어되지 않은 출력 길이는 실제 배포 환경에서 심각한 문제를 야기합니다. 특히 토큰, 지연 시간, 또는 컴퓨팅 자원에 대한 엄격한 제약이 있는 상황에서 더욱 그러합니다. 우리는 확장 가능한 사고 사슬을 위한 새로운 프레임워크인 Elastic Reasoning을 제안합니다. 이 프레임워크는 추론을 '사고'와 '해결' 두 단계로 명시적으로 분리하고, 각 단계에 독립적으로 예산을 할당합니다. 테스트 시 Elastic Reasoning은 해결 부분의 완전성을 우선시하여, 엄격한 자원 제약 하에서도 신뢰성을 크게 향상시킵니다. 사고 과정이 중단되었을 때도 모델이 적응적으로 추론하도록 가르치기 위해, 우리는 GRPO에 통합된 경량의 예산 제약 롤아웃 전략을 도입했습니다. 이 전략은 추가 훈련 없이도 보이지 않는 예산 제약에 효과적으로 일반화됩니다. 수학(AIME, MATH500) 및 프로그래밍(LiveCodeBench, Codeforces) 벤치마크에서의 실험 결과는 Elastic Reasoning이 엄격한 예산 제약 하에서도 견고하게 수행되며, 기존 방법보다 훨씬 낮은 훈련 비용을 발생시킨다는 것을 보여줍니다. 특히, 이 접근법은 제약이 없는 환경에서도 더 간결하고 효율적인 추론을 생성합니다. Elastic Reasoning은 대규모로 제어 가능한 추론이라는 시급한 과제에 대한 원칙적이고 실용적인 해결책을 제공합니다.

English

Large reasoning models (LRMs) have achieved remarkable progress on complex tasks by generating extended chains of thought (CoT). However, their uncontrolled output lengths pose significant challenges for real-world deployment, where inference-time budgets on tokens, latency, or compute are strictly constrained. We propose Elastic Reasoning, a novel framework for scalable chain of thoughts that explicitly separates reasoning into two phases--thinking and solution--with independently allocated budgets. At test time, Elastic Reasoning prioritize that completeness of solution segments, significantly improving reliability under tight resource constraints. To train models that are robust to truncated thinking, we introduce a lightweight budget-constrained rollout strategy, integrated into GRPO, which teaches the model to reason adaptively when the thinking process is cut short and generalizes effectively to unseen budget constraints without additional training. Empirical results on mathematical (AIME, MATH500) and programming (LiveCodeBench, Codeforces) benchmarks demonstrate that Elastic Reasoning performs robustly under strict budget constraints, while incurring significantly lower training cost than baseline methods. Remarkably, our approach also produces more concise and efficient reasoning even in unconstrained settings. Elastic Reasoning offers a principled and practical solution to the pressing challenge of controllable reasoning at scale.

탄력적 추론을 통한 확장 가능한 사고 연쇄

Scalable Chain of Thoughts via Elastic Reasoning

초록

Support