推論モデルは思考なしでも効果的であり得る

要旨

近年の大規模言語モデル（LLM）は、主に生成プロセスの一部として明示的で長い思考プロセスを含めることで、推論能力を大幅に向上させてきました。本論文では、この明示的な思考が本当に必要かどうかを問います。最先端のDeepSeek-R1-Distill-Qwenを使用して、単純なプロンプトで思考プロセスをバイパスする「NoThinking」が驚くほど効果的であることを発見しました。トークン数を制御した場合、NoThinkingは数学的問題解決、形式的定理証明、コーディングなど、多様な7つの難易度の高い推論データセットにおいて、特に低予算設定（例：700トークンでACM 23において51.3対28.9）でThinkingを上回りました。注目すべきは、kが増加するにつれてNoThinkingのパフォーマンスがpass@kにおいてより競争的になることです。この観察に基づき、NoThinkingを使用してN個の出力を独立して生成し、それらを集約する並列スケーリングアプローチが非常に効果的であることを示します。集約には、利用可能な場合はタスク固有の検証器を使用し、それ以外の場合は信頼度に基づく選択などの単純なbest-of-N戦略を適用します。我々の手法は、Thinkingを使用した類似のレイテンシを持つ一連のベースラインを上回り、著しく長いレイテンシ（最大9倍）を持つThinkingと同等の性能を発揮します。全体として、本研究は長い思考プロセスの必要性を再考することを促すと同時に、低予算設定または低レイテンシで並列スケーリングを使用して強力な推論性能を達成するための競争力のある参照を確立します。

English

Recent LLMs have significantly improved reasoning capabilities, primarily by including an explicit, lengthy Thinking process as part of generation. In this paper, we question whether this explicit thinking is necessary. Using the state-of-the-art DeepSeek-R1-Distill-Qwen, we find that bypassing the thinking process via simple prompting, denoted as NoThinking, can be surprisingly effective. When controlling for the number of tokens, NoThinking outperforms Thinking across a diverse set of seven challenging reasoning datasets--including mathematical problem solving, formal theorem proving, and coding--especially in low-budget settings, e.g., 51.3 vs. 28.9 on ACM 23 with 700 tokens. Notably, the performance of NoThinking becomes more competitive with pass@k as k increases. Building on this observation, we demonstrate that a parallel scaling approach that uses NoThinking to generate N outputs independently and aggregates them is highly effective. For aggregation, we use task-specific verifiers when available, or we apply simple best-of-N strategies such as confidence-based selection. Our method outperforms a range of baselines with similar latency using Thinking, and is comparable to Thinking with significantly longer latency (up to 9x). Together, our research encourages a reconsideration of the necessity of lengthy thinking processes, while also establishing a competitive reference for achieving strong reasoning performance in low-budget settings or at low latency using parallel scaling.

推論モデルは思考なしでも効果的であり得る

Reasoning Models Can Be Effective Without Thinking

要旨

Support