推論時のスケーリングによる離散拡散モデルの再マスキング

要旨

拡散モデルの成功の一部は、生成中に出力を繰り返し修正する能力、すなわち反復的な精緻化を実行できることに起因しています。しかし、現代のマスク型離散拡散モデルにはこの能力が欠けています：トークンが生成されると、たとえエラーが生じた場合でも、それを再度更新することができません。ここでは、この制限を解決するために、リマスキング拡散モデル（ReMDM）サンプラーを導入します。この手法は、事前学習済みのマスク型拡散モデルに原理的に適用可能であり、カスタムリマスキング逆過程を持つ離散拡散モデルから導出されます。最も興味深いことに、ReMDMは離散拡散モデルに推論時の計算スケーリングの形態を付与します。サンプリングステップ数を増やすことで、ReMDMは自己回帰モデルの品質に近づく自然言語出力を生成し、計算予算が限られている場合には、ReMDMは品質をより良く維持します。ReMDMはまた、離散化された画像に対するマスク型拡散モデルのサンプル品質を向上させ、分子設計などの科学分野では、ReMDMは拡散ガイダンスを容易にし、古典的なマスキングや一様ノイズ拡散に対する制御性のパレートフロンティアを押し上げます。プロジェクトページにコードとブログ記事を提供しています：https://remdm.github.io。

English

Part of the success of diffusion models stems from their ability to perform iterative refinement, i.e., repeatedly correcting outputs during generation. However, modern masked discrete diffusion lacks this capability: when a token is generated, it cannot be updated again, even when it introduces an error. Here, we address this limitation by introducing the remasking diffusion model (ReMDM) sampler, a method that can be applied to pretrained masked diffusion models in a principled way and that is derived from a discrete diffusion model with a custom remasking backward process. Most interestingly, ReMDM endows discrete diffusion with a form of inference-time compute scaling. By increasing the number of sampling steps, ReMDM generates natural language outputs that approach the quality of autoregressive models, whereas when the computation budget is limited, ReMDM better maintains quality. ReMDM also improves sample quality of masked diffusion models for discretized images, and in scientific domains such as molecule design, ReMDM facilitates diffusion guidance and pushes the Pareto frontier of controllability relative to classical masking and uniform noise diffusion. We provide the code along with a blog post on the project page: https://remdm.github.io.

推論時のスケーリングによる離散拡散モデルの再マスキング

Remasking Discrete Diffusion Models with Inference-Time Scaling

要旨

Support