예산 안내를 통한 LLM 사고 유도

초록

최근의 심층 사고 대형 언어 모델은 성능을 향상시키기 위해 광범위하게 추론을 수행하지만, 이러한 긴 추론 과정이 항상 바람직한 것은 아닙니다. 이는 과도한 추론 비용을 초래하면서도 성능 향상에 비례하지 않는 결과를 가져오기 때문입니다. 따라서 성능 저하 없이 추론 길이를 제어하는 것이 중요하지만, 특히 엄격한 사고 예산 하에서는 여전히 어려운 과제로 남아 있습니다. 본 연구에서는 LLM의 미세 조정 없이도 목표 예산에 맞춰 추론 과정을 조정할 수 있는 간단하면서도 효과적인 방법인 예산 가이던스를 제안합니다. 우리의 접근 방식은 다음 토큰 생성 중 남은 사고 길이에 대한 감마 분포를 모델링하는 경량 예측기를 도입합니다. 이 신호는 토큰 수준에서 부드럽게 생성 과정을 안내하는 데 사용되어, 전체 추론 흔적이 지정된 사고 예산을 준수하도록 합니다. 예산 가이던스는 사고 길이를 자연스럽게 제어할 수 있게 해주며, 특히 도전적인 수학 벤치마크에서 기준 방법 대비 상당한 토큰 효율성 개선을 달성합니다. 예를 들어, MATH-500 벤치마크에서 엄격한 예산 하에서 기준 방법 대비 최대 26%의 정확도 향상을 달성하면서도, 전체 사고 모델이 사용한 토큰의 63%만으로도 경쟁력 있는 정확도를 유지합니다. 예산 가이던스는 더 넓은 작업 영역으로 일반화될 수 있으며, 문제 난이도 추정과 같은 새로운 능력을 보여줍니다. 소스 코드는 https://github.com/UMass-Embodied-AGI/BudgetGuidance에서 확인할 수 있습니다.

English

Recent deep-thinking large language models often reason extensively to improve performance, but such lengthy reasoning is not always desirable, as it incurs excessive inference costs with disproportionate performance gains. Controlling reasoning length without sacrificing performance is therefore important, but remains challenging, especially under tight thinking budgets. We propose budget guidance, a simple yet effective method for steering the reasoning process of LLMs toward a target budget without requiring any LLM fine-tuning. Our approach introduces a lightweight predictor that models a Gamma distribution over the remaining thinking length during next-token generation. This signal is then used to guide generation in a soft, token-level manner, ensuring that the overall reasoning trace adheres to the specified thinking budget. Budget guidance enables natural control of the thinking length, along with significant token efficiency improvements over baseline methods on challenging math benchmarks. For instance, it achieves up to a 26% accuracy gain on the MATH-500 benchmark under tight budgets compared to baseline methods, while maintaining competitive accuracy with only 63% of the thinking tokens used by the full-thinking model. Budget guidance also generalizes to broader task domains and exhibits emergent capabilities, such as estimating question difficulty. The source code is available at: https://github.com/UMass-Embodied-AGI/BudgetGuidance.

예산 안내를 통한 LLM 사고 유도

Steering LLM Thinking with Budget Guidance

초록

Support