ChatPaper.aiChatPaper

多元思維:基於詞元級分支合併的推理方法

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

January 13, 2026
作者: Yao Tang, Li Dong, Yaru Hao, Qingxiu Dong, Furu Wei, Jiatao Gu
cs.AI

摘要

大型語言模型通常透過思維鏈(CoT)能更有效解決複雜推理任務,但代價是產生冗長且低頻寬的符記序列。相比之下,人類常採用軟性推理方式,持續維護可能後續步驟的機率分佈。受此啟發,我們提出多重思維——一種隨機軟推理機制,在每個思考步驟中採樣K個候選符記,並將其嵌入向量聚合為單一連續型多重符記。此方法既保留了詞彙嵌入的先驗分佈與標準離散生成的採樣動態,又能導出可處理的多重推演機率分佈。因此,多重思維軌跡可直接透過同策略強化學習(RL)進行優化。關鍵在於,多重思維具備自適應特性:當模型置信度高時,多重符記近乎離散,行為類似標準CoT;當模型不確定時,它能緊湊表徵多個合理後續步驟,且不增加序列長度。在具挑戰性的數學推理基準測試中,從Pass@1到Pass@1024的評估範圍內,多重思維始終優於強力的離散CoT與RL基線模型,同時產生更短的序列。程式碼與檢查點已開源於:https://github.com/GMLR-Penn/Multiplex-Thinking。
English
Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Thinking, a stochastic soft reasoning mechanism that, at each thinking step, samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token. This preserves the vocabulary embedding prior and the sampling dynamics of standard discrete generation, while inducing a tractable probability distribution over multiplex rollouts. Consequently, multiplex trajectories can be directly optimized with on-policy reinforcement learning (RL). Importantly, Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT; when it is uncertain, it compactly represents multiple plausible next steps without increasing sequence length. Across challenging math reasoning benchmarks, Multiplex Thinking consistently outperforms strong discrete CoT and RL baselines from Pass@1 through Pass@1024, while producing shorter sequences. The code and checkpoints are available at https://github.com/GMLR-Penn/Multiplex-Thinking.
PDF263January 21, 2026