CoRe^2: 収集、反映、洗練による高速かつ高品質な生成

要旨

テキストから画像（T2I）生成モデルのサンプリングを高速かつ高品質に行うことは、有望な研究分野です。これまでの研究では、サンプリング効率を犠牲にして合成画像の視覚的品質を向上させるか、あるいはベースモデルの生成能力を改善せずにサンプリングを劇的に高速化することに焦点が当てられてきました。さらに、ほとんどの推論手法は、拡散モデル（DMs）と視覚的自動回帰モデル（ARMs）の両方で安定した性能を同時に確保することができませんでした。本論文では、Collect、Reflect、Refineの3つのサブプロセスからなる新しいプラグアンドプレイ型推論パラダイム、CoRe^2を提案します。CoRe^2はまず、クラス分類器不要ガイダンス（CFG）の軌跡を収集し、その後、収集したデータを使用して、推論中の関数評価回数を半減させながら、学習しやすい内容を反映する弱いモデルを訓練します。続いて、CoRe^2は弱いモデルから強いモデルへのガイダンスを使用して条件付き出力を洗練し、ベースモデルが捉えることが難しい高周波で現実的な内容を生成する能力を向上させます。私たちの知る限り、CoRe^2は、SDXL、SD3.5、FLUXなどの幅広いDMsや、LlamaGenのようなARMsにおいて、効率と効果の両方を初めて実証した手法です。HPD v2、Pick-of-Pic、Drawbench、GenEval、T2I-Compbenchにおいて、大幅な性能向上を示しています。さらに、CoRe^2は最先端のZ-Samplingとシームレスに統合でき、PickScoreとAESでそれぞれ0.3と0.16の性能向上を達成し、SD3.5を使用して5.64秒の時間節約を実現しています。コードはhttps://github.com/xie-lab-ml/CoRe/tree/mainで公開されています。

English

Making text-to-image (T2I) generative model sample both fast and well represents a promising research direction. Previous studies have typically focused on either enhancing the visual quality of synthesized images at the expense of sampling efficiency or dramatically accelerating sampling without improving the base model's generative capacity. Moreover, nearly all inference methods have not been able to ensure stable performance simultaneously on both diffusion models (DMs) and visual autoregressive models (ARMs). In this paper, we introduce a novel plug-and-play inference paradigm, CoRe^2, which comprises three subprocesses: Collect, Reflect, and Refine. CoRe^2 first collects classifier-free guidance (CFG) trajectories, and then use collected data to train a weak model that reflects the easy-to-learn contents while reducing number of function evaluations during inference by half. Subsequently, CoRe^2 employs weak-to-strong guidance to refine the conditional output, thereby improving the model's capacity to generate high-frequency and realistic content, which is difficult for the base model to capture. To the best of our knowledge, CoRe^2 is the first to demonstrate both efficiency and effectiveness across a wide range of DMs, including SDXL, SD3.5, and FLUX, as well as ARMs like LlamaGen. It has exhibited significant performance improvements on HPD v2, Pick-of-Pic, Drawbench, GenEval, and T2I-Compbench. Furthermore, CoRe^2 can be seamlessly integrated with the state-of-the-art Z-Sampling, outperforming it by 0.3 and 0.16 on PickScore and AES, while achieving 5.64s time saving using SD3.5.Code is released at https://github.com/xie-lab-ml/CoRe/tree/main.

CoRe^2: 収集、反映、洗練による高速かつ高品質な生成

CoRe^2: Collect, Reflect and Refine to Generate Better and Faster

要旨

Support