DiffCoT:大语言模型中的扩散式思维链推理
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs
January 7, 2026
作者: Shidong Cao, Hongzhan Lin, Yuxuan Gu, Ziyang Luo, Jing Ma
cs.AI
摘要
思维链推理虽能提升大语言模型的多步骤数学问题解决能力,但仍易受暴露偏差和错误累积的影响——早期错误会通过自回归解码不可逆地传播。本研究提出DiffCoT,一种扩散式思维链框架,将思维链推理重新定义为迭代去噪过程。DiffCoT通过滑动窗口机制在推理步骤层面融合扩散原理,在保持词元级自回归的同时,实现中间步骤的协同生成与回溯修正。为维持因果一致性,我们进一步提出遵循推理链时序结构的因果扩散噪声调度机制。在三个多步骤思维链推理基准上的大量实验表明,DiffCoT在不同模型架构中均能稳定优于现有思维链偏好优化方法,显著提升了思维链推理的鲁棒性与纠错能力。
English
Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias and error accumulation, as early mistakes propagate irreversibly through autoregressive decoding. In this work, we propose DiffCoT, a diffusion-styled CoT framework that reformulates CoT reasoning as an iterative denoising process. DiffCoT integrates diffusion principles at the reasoning-step level via a sliding-window mechanism, enabling unified generation and retrospective correction of intermediate steps while preserving token-level autoregression. To maintain causal consistency, we further introduce a causal diffusion noise schedule that respects the temporal structure of reasoning chains. Extensive experiments on three multi-step CoT reasoning benchmarks across diverse model backbones demonstrate that DiffCoT consistently outperforms existing CoT preference optimization methods, yielding improved robustness and error-correction capability in CoT reasoning.