OThink-R1: 過剰推論を軽減するための本質的速考/熟考モード切替機構

要旨

近年の高度な大規模推論モデル（LRM）は、拡張された連鎖思考（CoT）推論を活用して複雑なタスクを解決し、最先端の性能を達成しています。しかし、その成功にもかかわらず、重要な問題が明らかになりました。LRMによって解決される単純なタスクの大部分は、非推論型の大規模言語モデル（LLM）によっても、はるかに少ないトークン数で解決可能であり、複雑な推論が常に必要とは限らないことが示唆されています。この問題に対処するため、我々はLRMの推論軌跡を体系的に分析し、特定されたパラダイムとLLM-Judgeを活用してこれらの軌跡を「冗長推論」または「本質的推論」に分類する方法を提示します。さらに、OThink-R1という手法を導入し、論理的な妥当性を保ちながら冗長な推論ステップを削減します。OThink-R1は、単純な問題に対しては非思考モード（高速思考）を動的に採用し、複雑な問題に対しては慎重な思考（低速思考）を適用します。数学的タスクや質問応答タスクにおける実験により、OThink-R1が精度を損なうことなく推論の冗長性を平均で約23％削減することが実証され、効率的な推論モデルのための実践的なガイドラインを提供します。コードはhttps://github.com/AgenticIR-Lab/OThink-R1で公開されています。

English

Recent advanced large reasoning models (LRMs) leverage extended chain-of-thought (CoT) reasoning to solve complex tasks, achieving state-of-the-art performance. Despite their success, we identify a critical issue: a substantial portion of simple tasks solved by LRMs can also be addressed by non-reasoning LLMs using significantly fewer tokens, indicating the complex reasoning may not always be necessary. To address this, we systematically analyze the reasoning trajectories of LRMs and present a method utilizing identified paradigms and LLM-Judge to classify these trajectories as either Redundant Reasoning or Essential Reasoning. And we introduce OThink-R1, a method that prunes redundant reasoning steps while preserving logical validity. OThink-R1 dynamically employs the non-thinking mode (fast-thinking) for straightforward problems while engaging in deliberate thinking (slow-thinking) for complex problems. Experiments across mathematical and question-answering tasks demonstrate that OThink-R1 reduces reasoning redundancy by almost 23\% on average without compromising accuracy, offering practical guidelines for efficient reasoning models. The code is available at https://github.com/AgenticIR-Lab/OThink-R1.

OThink-R1: 過剰推論を軽減するための本質的速考/熟考モード切替機構

OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

要旨

Support