ChatPaper.aiChatPaper

階段一致性模型

Phased Consistency Model

May 28, 2024
作者: Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang
cs.AI

摘要

最近一直在加速擴散模型生成的過程中取得了顯著進展的一致性模型(CM)。然而,對於在潛在空間中進行高解析度、以文本為條件的圖像生成(又稱為LCM),其應用仍然不盡滿意。在本文中,我們確定了LCM目前設計中的三個關鍵缺陷。我們調查了這些限制背後的原因,並提出了階段性一致性模型(PCM),該模型擴展了設計空間並解決了所有已確定的限制。我們的評估表明,在1-16步生成設置中,PCM明顯優於LCM。雖然PCM專門設計用於多步細化,但其甚至實現了比先前最先進的專門設計的1步方法更優越或可比的1步生成結果。此外,我們展示了PCM的方法論是多才多藝的,適用於視頻生成,使我們能夠訓練最先進的少步文本到視頻生成器。更多詳細信息請參見https://g-u-n.github.io/projects/pcm/。
English
The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations. Our evaluations demonstrate that PCM significantly outperforms LCM across 1--16 step generation settings. While PCM is specifically designed for multi-step refinement, it achieves even superior or comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show that PCM's methodology is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator. More details are available at https://g-u-n.github.io/projects/pcm/.

Summary

AI-Generated Summary

PDF4911December 12, 2024