生成する際に考える：計画されたノイズ除去と離散拡散

要旨

離散拡散は、標準ベンチマークにおいて、最先端のパフォーマンスを達成し、オートレグレッシブモデルを上回るかそれに匹敵しています。本研究では、計画されたノイズ除去を伴う離散拡散（DDPD）という新しいフレームワークを紹介します。このフレームワークは、生成プロセスをプランナーとデノイザーの2つのモデルに分割します。推論時には、プランナーが次にどの位置をノイズ除去するか選択し、ノイズ除去が必要な最も破損した位置を特定します。これには、最初に破損した位置と追加の微調整が必要な位置の両方が含まれます。この計画とノイズ除去のアプローチにより、最適な順序で破損を特定し、ノイズ除去することで、生成中のより効率的な再構築が可能となります。DDPDは、従来のデノイザーのみのマスク拡散手法を上回り、text8、OpenWebText、およびImageNet 256×256でのトークンベースの生成などの言語モデリングベンチマークで優れた結果を達成します。特に、言語モデリングにおいて、DDPDは、生成的な困惑度の観点で、拡散ベースとオートレグレッシブ手法との間の性能差を著しく縮小させます。コードはhttps://github.com/liusulin/DDPDで入手可能です。

English

Discrete diffusion has achieved state-of-the-art performance, outperforming or approaching autoregressive models on standard benchmarks. In this work, we introduce Discrete Diffusion with Planned Denoising (DDPD), a novel framework that separates the generation process into two models: a planner and a denoiser. At inference time, the planner selects which positions to denoise next by identifying the most corrupted positions in need of denoising, including both initially corrupted and those requiring additional refinement. This plan-and-denoise approach enables more efficient reconstruction during generation by iteratively identifying and denoising corruptions in the optimal order. DDPD outperforms traditional denoiser-only mask diffusion methods, achieving superior results on language modeling benchmarks such as text8, OpenWebText, and token-based generation on ImageNet 256 times 256. Notably, in language modeling, DDPD significantly reduces the performance gap between diffusion-based and autoregressive methods in terms of generative perplexity. Code is available at https://github.com/liusulin/DDPD.

生成する際に考える：計画されたノイズ除去と離散拡散

Think While You Generate: Discrete Diffusion with Planned Denoising

要旨

Support