待つ必要はない！「思考トークン」を除去することで推論効率が向上する

要旨

大規模推論モデルの最近の進展により、複雑な段階的推論が可能となったが、しばしば過剰な思考が導入され、冗長で非効率な出力が生じることが問題となっている。本研究では、「Wait」や「Hmm」といったトークンによって示される明示的な自己反映が、高度な推論に必要かどうかを検証する。我々は、推論中にこれらのトークンを抑制することで明示的な自己反映を無効化する、シンプルかつ効果的なアプローチであるNoWaitを提案する。テキスト、視覚、映像推論タスクにわたる10のベンチマークでの広範な実験により、NoWaitが5つのR1スタイルモデルシリーズにおいて、モデルの有用性を損なうことなく、連鎖思考の軌跡長を最大27%～51%削減することが示された。したがって、NoWaitは効率的かつ有用性を維持したマルチモーダル推論のためのプラグアンドプレイソリューションを提供する。

English

Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as "Wait" and "Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27%-51% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.

待つ必要はない！「思考トークン」を除去することで推論効率が向上する

Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency

要旨

Support