PATS: 프로세스 수준 적응적 사고 모드 전환

초록

현재의 대형 언어 모델(LLM)은 일반적으로 모든 질문에 대해 단순하거나 복잡한 고정된 추론 전략을 채택하며, 이는 질문의 난이도와 무관하게 적용됩니다. 이러한 작업 및 추론 과정의 복잡성 변화를 고려하지 않음으로써 성능과 효율성 간의 불균형이 발생합니다. 기존 방법들은 다양한 난이도의 문제를 처리하기 위해 학습 없이 빠른 사고와 느린 사고 시스템 간의 전환을 구현하려 시도했지만, 이는 대략적인 솔루션 수준의 전략 조정에 제한적입니다. 이 문제를 해결하기 위해, 우리는 새로운 추론 패러다임인 프로세스 수준 적응형 사고 모드 전환(PATS)을 제안합니다. 이는 LLM이 각 단계의 난이도에 따라 추론 전략을 동적으로 조정하여 정확성과 계산 효율성 간의 균형을 최적화할 수 있게 합니다. 우리의 접근 방식은 프로세스 보상 모델(PRM)과 빔 서치를 통합하며, 점진적인 모드 전환과 잘못된 단계에 대한 페널티 메커니즘을 포함합니다. 다양한 수학 벤치마크에서의 실험은 우리의 방법론이 높은 정확성을 유지하면서도 적절한 토큰 사용량을 유지함을 보여줍니다. 이 연구는 프로세스 수준에서 난이도를 인지한 추론 전략 적응의 중요성을 강조하며, LLM의 효율적인 추론에 대한 유용한 통찰을 제공합니다.

English

Current large-language models (LLMs) typically adopt a fixed reasoning strategy, either simple or complex, for all questions, regardless of their difficulty. This neglect of variation in task and reasoning process complexity leads to an imbalance between performance and efficiency. Existing methods attempt to implement training-free fast-slow thinking system switching to handle problems of varying difficulty, but are limited by coarse-grained solution-level strategy adjustments. To address this issue, we propose a novel reasoning paradigm: Process-Level Adaptive Thinking Mode Switching (PATS), which enables LLMs to dynamically adjust their reasoning strategy based on the difficulty of each step, optimizing the balance between accuracy and computational efficiency. Our approach integrates Process Reward Models (PRMs) with Beam Search, incorporating progressive mode switching and bad-step penalty mechanisms. Experiments on diverse mathematical benchmarks demonstrate that our methodology achieves high accuracy while maintaining moderate token usage. This study emphasizes the significance of process-level, difficulty-aware reasoning strategy adaptation, offering valuable insights into efficient inference for LLMs.

PATS: 프로세스 수준 적응적 사고 모드 전환

PATS: Process-Level Adaptive Thinking Mode Switching

초록

Support