超越固定長度:擴散式大語言模型的變長度去噪技術
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models
August 1, 2025
作者: Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Jiaqi Wang, Dahua Lin
cs.AI
摘要
擴散式大型語言模型(DLLMs)正逐漸成為主導的自回歸大型語言模型的有力替代方案,提供高效的並行生成能力和強大的全局上下文建模能力。然而,DLLMs的實際應用受到一個關鍵架構限制的阻礙:需要靜態預定義的生成長度。這種靜態長度分配導致了一個棘手的權衡:長度不足會嚴重影響複雜任務的性能,而過長的長度則會帶來顯著的計算開銷,有時甚至導致性能下降。儘管推理框架是固定的,我們觀察到模型本身具有與特定任務最佳回應長度相關的內部信號。為彌補這一差距,我們利用這些潛在信號,引入了DAEDAL,這是一種新穎的免訓練去噪策略,實現了擴散式大型語言模型的動態自適應長度擴展。DAEDAL分兩個階段運作:1)在去噪過程之前,DAEDAL從一個較短的初始長度開始,並根據序列完成度指標迭代擴展至粗略的任務適宜長度。2)在去噪過程中,DAEDAL通過插入掩碼標記來精確定位並擴展生成不足的區域,確保最終輸出完全發展。在DLLMs上的大量實驗表明,DAEDAL的性能與精心調校的固定長度基線相當,在某些情況下甚至更優,同時通過實現更高的有效標記比率來提升計算效率。通過解決靜態長度限制,DAEDAL釋放了DLLMs的新潛力,彌補了與自回歸模型之間的關鍵差距,為更高效、更強大的生成鋪平了道路。
English
Diffusion Large Language Models (DLLMs) are emerging as a powerful
alternative to the dominant Autoregressive Large Language Models, offering
efficient parallel generation and capable global context modeling. However, the
practical application of DLLMs is hindered by a critical architectural
constraint: the need for a statically predefined generation length. This static
length allocation leads to a problematic trade-off: insufficient lengths
cripple performance on complex tasks, while excessive lengths incur significant
computational overhead and sometimes result in performance degradation. While
the inference framework is rigid, we observe that the model itself possesses
internal signals that correlate with the optimal response length for a given
task. To bridge this gap, we leverage these latent signals and introduce
DAEDAL, a novel training-free denoising strategy that enables Dynamic Adaptive
Length Expansion for Diffusion Large Language Models. DAEDAL operates in two
phases: 1) Before the denoising process, DAEDAL starts from a short initial
length and iteratively expands it to a coarse task-appropriate length, guided
by a sequence completion metric. 2) During the denoising process, DAEDAL
dynamically intervenes by pinpointing and expanding insufficient generation
regions through mask token insertion, ensuring the final output is fully
developed. Extensive experiments on DLLMs demonstrate that DAEDAL achieves
performance comparable, and in some cases superior, to meticulously tuned
fixed-length baselines, while simultaneously enhancing computational efficiency
by achieving a higher effective token ratio. By resolving the static length
constraint, DAEDAL unlocks new potential for DLLMs, bridging a critical gap
with their Autoregressive counterparts and paving the way for more efficient
and capable generation.