思考生成:計劃去噪離散擴散
Think While You Generate: Discrete Diffusion with Planned Denoising
October 8, 2024
作者: Sulin Liu, Juno Nam, Andrew Campbell, Hannes Stärk, Yilun Xu, Tommi Jaakkola, Rafael Gómez-Bombarelli
cs.AI
摘要
離散擴散已經取得了最先進的表現,在標準基準測試中優於或接近自回歸模型。在這項工作中,我們介紹了具有計劃去噪(DDPD)的離散擴散,這是一個將生成過程分為兩個模型的新框架:計劃者和去噪器。在推論時,計劃者通過識別最受損位置來選擇下一步需要去噪的位置,包括最初受損和需要進一步精緻的位置。這種計劃和去噪的方法通過迭代地識別和去噪最佳順序中的損壞,在生成期間實現更高效的重建。DDPD優於傳統僅去噪器的遮罩擴散方法,在語言建模基準測試(如text8、OpenWebText和基於ImageNet 256x256的基於標記的生成)上取得了優異結果。值得注意的是,在語言建模中,DDPD在生成困惑度方面顯著降低了基於擴散和自回歸方法之間的性能差距。代碼可在https://github.com/liusulin/DDPD找到。
English
Discrete diffusion has achieved state-of-the-art performance, outperforming
or approaching autoregressive models on standard benchmarks. In this work, we
introduce Discrete Diffusion with Planned Denoising (DDPD), a novel framework
that separates the generation process into two models: a planner and a
denoiser. At inference time, the planner selects which positions to denoise
next by identifying the most corrupted positions in need of denoising,
including both initially corrupted and those requiring additional refinement.
This plan-and-denoise approach enables more efficient reconstruction during
generation by iteratively identifying and denoising corruptions in the optimal
order. DDPD outperforms traditional denoiser-only mask diffusion methods,
achieving superior results on language modeling benchmarks such as text8,
OpenWebText, and token-based generation on ImageNet 256 times 256. Notably,
in language modeling, DDPD significantly reduces the performance gap between
diffusion-based and autoregressive methods in terms of generative perplexity.
Code is available at https://github.com/liusulin/DDPD.Summary
AI-Generated Summary