PATS: プロセスレベル適応的思考モード切り替え

要旨

現在の大規模言語モデル（LLM）は、問題の難易度に関わらず、単純または複雑な固定の推論戦略をすべての質問に適用する傾向があります。このタスクや推論プロセスの複雑さの変化を無視するアプローチは、性能と効率の間の不均衡を引き起こします。既存の手法では、難易度の異なる問題に対処するために、トレーニング不要の高速-低速思考システムの切り替えを実装しようとしていますが、粗い粒度のソリューションレベルでの戦略調整に限定されています。この問題を解決するため、我々は新しい推論パラダイムを提案します：プロセスレベル適応型思考モード切り替え（PATS）です。PATSは、LLMが各ステップの難易度に基づいて推論戦略を動的に調整し、精度と計算効率のバランスを最適化することを可能にします。我々のアプローチは、プロセス報酬モデル（PRM）とビームサーチを統合し、段階的なモード切り替えと不良ステップペナルティメカニズムを組み込んでいます。多様な数学的ベンチマークでの実験により、本手法が高い精度を維持しながら適度なトークン使用量を実現することが示されました。本研究は、プロセスレベルでの難易度認識型推論戦略適応の重要性を強調し、LLMの効率的な推論に関する貴重な知見を提供します。

English

Current large-language models (LLMs) typically adopt a fixed reasoning strategy, either simple or complex, for all questions, regardless of their difficulty. This neglect of variation in task and reasoning process complexity leads to an imbalance between performance and efficiency. Existing methods attempt to implement training-free fast-slow thinking system switching to handle problems of varying difficulty, but are limited by coarse-grained solution-level strategy adjustments. To address this issue, we propose a novel reasoning paradigm: Process-Level Adaptive Thinking Mode Switching (PATS), which enables LLMs to dynamically adjust their reasoning strategy based on the difficulty of each step, optimizing the balance between accuracy and computational efficiency. Our approach integrates Process Reward Models (PRMs) with Beam Search, incorporating progressive mode switching and bad-step penalty mechanisms. Experiments on diverse mathematical benchmarks demonstrate that our methodology achieves high accuracy while maintaining moderate token usage. This study emphasizes the significance of process-level, difficulty-aware reasoning strategy adaptation, offering valuable insights into efficient inference for LLMs.

PATS: プロセスレベル適応的思考モード切り替え

PATS: Process-Level Adaptive Thinking Mode Switching

要旨

Support