ChatPaper.aiChatPaper

多元思维:基于词元分支与合并的推理方法

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

January 13, 2026
作者: Yao Tang, Li Dong, Yaru Hao, Qingxiu Dong, Furu Wei, Jiatao Gu
cs.AI

摘要

大型语言模型通常通过思维链(CoT)能更有效地解决复杂推理任务,但代价是生成冗长且低带宽的令牌序列。相比之下,人类常通过保持对可能后续步骤的概率分布进行软推理。受此启发,我们提出多重思维——一种随机软推理机制,在每个思考步骤中采样K个候选令牌,并将其嵌入聚合为单个连续的多重令牌。这种方法既保留了词汇嵌入先验和标准离散生成的采样动态,又能在多重推演路径上形成可处理的概率分布。因此,多重思维轨迹可直接通过同策略强化学习进行优化。值得注意的是,该机制具有自适应性:当模型置信度高时,多重令牌近乎离散,行为类似标准CoT;当模型不确定时,它能紧凑表征多个可能后续步骤且不增加序列长度。在具有挑战性的数学推理基准测试中,从Pass@1到Pass@1024的评估范围内,多重思维始终优于强离散CoT和RL基线方法,同时生成更短的序列。代码与模型检查点已开源:https://github.com/GMLR-Penn/Multiplex-Thinking。
English
Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Thinking, a stochastic soft reasoning mechanism that, at each thinking step, samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token. This preserves the vocabulary embedding prior and the sampling dynamics of standard discrete generation, while inducing a tractable probability distribution over multiplex rollouts. Consequently, multiplex trajectories can be directly optimized with on-policy reinforcement learning (RL). Importantly, Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT; when it is uncertain, it compactly represents multiple plausible next steps without increasing sequence length. Across challenging math reasoning benchmarks, Multiplex Thinking consistently outperforms strong discrete CoT and RL baselines from Pass@1 through Pass@1024, while producing shorter sequences. The code and checkpoints are available at https://github.com/GMLR-Penn/Multiplex-Thinking.
PDF263January 21, 2026