多重思考：トークン単位の分岐と統合による推論

要旨

大規模言語モデルは、思考連鎖（CoT）を用いることで複雑な推論タスクを効果的に解決することが多いが、長く低帯域なトークン列を必要とするコストが伴う。一方、人間は多くの場合、可能性のある次のステップの分布を維持することで「ソフトな」推論を行う。この観察に基づき、我々は確率的ソフト推論機構であるMultiplex Thinkingを提案する。これは各思考ステップにおいてK個の候補トークンをサンプリングし、その埋め込みを単一の連続的な多重化トークンに集約する。これにより、標準的な離散生成の語彙埋め込み事前分布とサンプリング動態を維持しつつ、多重化ロールアウト上の扱いやすい確率分布を誘導する。その結果、多重化軌道は方策オン強化学習（RL）で直接最適化できる。重要な点として、Multiplex Thinkingは自己適応的である：モデルが確信がある場合、多重化トークンはほぼ離散的となり標準CoTと同様に振る舞う；不確実性が高い場合、系列長を増加させることなく複数の可能性のある次のステップをコンパクトに表現する。難易度の高い数学推論ベンチマークにおいて、Multiplex ThinkingはPass@1からPass@1024にわたり、強力な離散CoT及びRLベースラインを一貫して上回り、かつより短い系列を生成する。コードとチェックポイントはhttps://github.com/GMLR-Penn/Multiplex-Thinkingで公開されている。

English

Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Thinking, a stochastic soft reasoning mechanism that, at each thinking step, samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token. This preserves the vocabulary embedding prior and the sampling dynamics of standard discrete generation, while inducing a tractable probability distribution over multiplex rollouts. Consequently, multiplex trajectories can be directly optimized with on-policy reinforcement learning (RL). Importantly, Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT; when it is uncertain, it compactly represents multiple plausible next steps without increasing sequence length. Across challenging math reasoning benchmarks, Multiplex Thinking consistently outperforms strong discrete CoT and RL baselines from Pass@1 through Pass@1024, while producing shorter sequences. The code and checkpoints are available at https://github.com/GMLR-Penn/Multiplex-Thinking.

多重思考：トークン単位の分岐と統合による推論

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

要旨

Support