AdaR1: 長文CoTからハイブリッドCoTへ - 二段階適応的推論最適化を介して

要旨

近年、長文推論モデルは複雑な推論タスクにおいて高い性能を達成していますが、しばしば多大な推論オーバーヘッドを伴い、効率性が重要な課題となっています。私たちの実証分析によると、Long-CoTの使用による効果は問題によって異なり、一部の問題では詳細な推論が必要である一方、他の問題では改善が見られないか、むしろ精度が低下する場合もあります。これにより、入力に応じて推論の深さを調整する適応的推論戦略の必要性が浮き彫りになりました。しかし、これまでの研究は主に長い推論パス内の冗長性を削減することに焦点を当てており、Long-CoTパラダイムを超えたより効率的な戦略の探求が限られていました。この問題に対処するため、私たちは適応的かつ効率的な推論のための新しい二段階フレームワークを提案します。まず、長文と短文のCoTモデルを統合してハイブリッド推論モデルを構築し、多様な推論スタイルを可能にします。次に、二段階の選好学習を適用して、モデルが適切な推論スタイルを選択する（グループレベル）とともに、各スタイルグループ内で簡潔で正確な推論を好む（インスタンスレベル）ように導きます。実験結果は、私たちの手法が他のベースラインアプローチと比較して推論コストを大幅に削減しつつ、性能を維持することを示しています。特に、5つの数学データセットにおいて、推論の平均長が50%以上短縮され、大規模言語モデルにおける推論効率を最適化する適応戦略の可能性が強調されました。私たちのコードは近日中にhttps://github.com/StarDewXXX/AdaR1で公開予定です。

English

Recently, long-thought reasoning models achieve strong performance on complex reasoning tasks, but often incur substantial inference overhead, making efficiency a critical concern. Our empirical analysis reveals that the benefit of using Long-CoT varies across problems: while some problems require elaborate reasoning, others show no improvement, or even degraded accuracy. This motivates adaptive reasoning strategies that tailor reasoning depth to the input. However, prior work primarily reduces redundancy within long reasoning paths, limiting exploration of more efficient strategies beyond the Long-CoT paradigm. To address this, we propose a novel two-stage framework for adaptive and efficient reasoning. First, we construct a hybrid reasoning model by merging long and short CoT models to enable diverse reasoning styles. Second, we apply bi-level preference training to guide the model to select suitable reasoning styles (group-level), and prefer concise and correct reasoning within each style group (instance-level). Experiments demonstrate that our method significantly reduces inference costs compared to other baseline approaches, while maintaining performance. Notably, on five mathematical datasets, the average length of reasoning is reduced by more than 50%, highlighting the potential of adaptive strategies to optimize reasoning efficiency in large language models. Our code is coming soon at https://github.com/StarDewXXX/AdaR1

AdaR1: 長文CoTからハイブリッドCoTへ - 二段階適応的推論最適化を介して

AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

要旨

Support