ChatPaper.aiChatPaper

分阶一致性模型

Phased Consistency Model

May 28, 2024
作者: Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang
cs.AI

摘要

一致性模型(CM)最近在加速扩散模型生成方面取得了显著进展。然而,其在潜在空间中进行高分辨率、文本条件图像生成(即LCM)的应用仍然不尽人意。本文识别了LCM当前设计中的三个关键缺陷。我们调查了这些限制背后的原因,并提出了阶段一致性模型(PCM),它泛化了设计空间并解决了所有已识别的限制。我们的评估表明,在1-16步生成设置中,PCM明显优于LCM。虽然PCM专为多步细化而设计,但其在1步生成结果方面甚至优于或与先前最先进的专门设计的1步方法相媲美。此外,我们展示了PCM的方法论是多才多艺的,并且适用于视频生成,使我们能够训练最先进的少步文本到视频生成器。更多详细信息请访问https://g-u-n.github.io/projects/pcm/。
English
The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations. Our evaluations demonstrate that PCM significantly outperforms LCM across 1--16 step generation settings. While PCM is specifically designed for multi-step refinement, it achieves even superior or comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show that PCM's methodology is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator. More details are available at https://g-u-n.github.io/projects/pcm/.

Summary

AI-Generated Summary

PDF4911December 12, 2024