时间是关键特征：探索扩散语言模型中的时序动态性

摘要

扩散大语言模型（dLLMs）通过迭代去噪生成文本，然而当前的解码策略舍弃了丰富的中间预测，仅保留最终输出。我们的研究揭示了一个关键现象——时间振荡，即正确答案常在中间过程出现，但在后续去噪步骤中被覆盖。为解决这一问题，我们提出了两种利用时间一致性的互补方法：1）时间自一致性投票，一种无需训练、在测试时应用的解码策略，通过聚合去噪步骤中的预测来选择最一致的输出；2）一种称为时间一致性强化的训练后方法，它使用时间语义熵（TSE）——衡量中间预测间语义稳定性的指标——作为奖励信号，以促进生成稳定性。多项基准测试的实证结果验证了我们方法的有效性。仅使用负TSE奖励，我们在Countdown数据集上观察到现有dLLM平均提升了24.7%。结合准确率奖励，我们分别在GSM8K、MATH500、SVAMP和Countdown上实现了2.0%、4.3%、6.6%和25.3%的绝对提升。我们的发现强调了dLLMs中时间动态的未开发潜力，并提供了两种简单而有效的工具来利用它们。

English

Diffusion large language models (dLLMs) generate text through iterative denoising, yet current decoding strategies discard rich intermediate predictions in favor of the final output. Our work here reveals a critical phenomenon, temporal oscillation, where correct answers often emerge in the middle process, but are overwritten in later denoising steps. To address this issue, we introduce two complementary methods that exploit temporal consistency: 1) Temporal Self-Consistency Voting, a training-free, test-time decoding strategy that aggregates predictions across denoising steps to select the most consistent output; and 2) a post-training method termed Temporal Consistency Reinforcement, which uses Temporal Semantic Entropy (TSE), a measure of semantic stability across intermediate predictions, as a reward signal to encourage stable generations. Empirical results across multiple benchmarks demonstrate the effectiveness of our approach. Using the negative TSE reward alone, we observe a remarkable average improvement of 24.7% on the Countdown dataset over an existing dLLM. Combined with the accuracy reward, we achieve absolute gains of 2.0% on GSM8K, 4.3% on MATH500, 6.6% on SVAMP, and 25.3% on Countdown, respectively. Our findings underscore the untapped potential of temporal dynamics in dLLMs and offer two simple yet effective tools to harness them.

时间是关键特征：探索扩散语言模型中的时序动态性

Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models

摘要

Support