超越固定长度:面向扩散大语言模型的变长去噪技术
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
August 1, 2025
作者: Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Dahua Lin
cs.AI
摘要
扩散大语言模型(DLLMs)正逐渐成为主导的自回归大语言模型的有力替代方案,它们提供了高效的并行生成能力,并具备全局上下文建模的潜力。然而,DLLMs的实际应用受到一个关键架构限制的阻碍:需要预先静态定义生成长度。这种静态长度分配导致了一个棘手的问题:长度不足会严重影响复杂任务的表现,而过度长度则带来显著的计算开销,有时甚至导致性能下降。尽管推理框架是固定的,我们观察到模型本身拥有与给定任务最佳响应长度相关的内部信号。为了弥合这一差距,我们利用这些潜在信号,引入了DAEDAL,一种无需训练的新型去噪策略,实现了扩散大语言模型的动态自适应长度扩展。DAEDAL分两个阶段运作:1)在去噪过程之前,DAEDAL从较短的初始长度出发,通过序列完成度指标的引导,迭代扩展至粗略的任务适宜长度。2)在去噪过程中,DAEDAL通过插入掩码标记,动态识别并扩展生成不足的区域,确保最终输出充分发展。在DLLMs上的大量实验表明,DAEDAL不仅达到了与精心调校的固定长度基线相当甚至更优的性能,同时通过提高有效标记比率,显著提升了计算效率。通过解决静态长度限制,DAEDAL释放了DLLMs的新潜力,缩小了与自回归模型的关键差距,为更高效、更强大的生成铺平了道路。
English
Diffusion Large Language Models (DLLMs) are emerging as a powerful
alternative to the dominant Autoregressive Large Language Models, offering
efficient parallel generation and capable global context modeling. However, the
practical application of DLLMs is hindered by a critical architectural
constraint: the need for a statically predefined generation length. This static
length allocation leads to a problematic trade-off: insufficient lengths
cripple performance on complex tasks, while excessive lengths incur significant
computational overhead and sometimes result in performance degradation. While
the inference framework is rigid, we observe that the model itself possesses
internal signals that correlate with the optimal response length for a given
task. To bridge this gap, we leverage these latent signals and introduce
DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive
Length Expansion for Diffusion Large Language Models. DAEDAL operates in two
phases: 1) Before the denoising process, DAEDAL starts from a short initial
length and iteratively expands it to a coarse task-appropriate length, guided
by a sequence completion metric. 2) During the denoising process, DAEDAL
dynamically intervenes by pinpointing and expanding insufficient generation
regions through mask token insertion, ensuring the final output is fully
developed. Extensive experiments on DLLMs demonstrate that DAEDAL achieves
performance comparable, and in some cases superior, to meticulously tuned
fixed-length baselines, while simultaneously enhancing computational efficiency
by achieving a higher effective token ratio. By resolving the static length
constraint, DAEDAL unlocks new potential for DLLMs, bridging a critical gap
with their Autoregressive counterparts and paving the way for more efficient
and capable generation.