ChatPaper.aiChatPaper

實現成本效益的擴展:自我串級擴散模型用於更高解析度的適應

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

February 16, 2024
作者: Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen
cs.AI

摘要

擴散模型在圖像和視頻生成方面已被證明非常有效;然而,由於單一尺度訓練數據,它們在生成不同尺寸的圖像時仍面臨組合挑戰。將大型預訓練的擴散模型適應更高分辨率的需求,需要大量的計算和優化資源,但實現與低分辨率模型相媲美的生成能力仍然難以實現。本文提出了一種新穎的自我級聯擴散模型,利用從訓練良好的低分辨率模型獲得的豐富知識,快速適應更高分辨率的圖像和視頻生成,採用無調整或成本低廉的上採樣器調整範式。通過集成一系列多尺度上採樣器模塊,自我級聯擴散模型可以有效適應更高分辨率,保留原始的組合和生成能力。我們進一步提出了一種基於中心引導的噪聲重新安排策略,以加速推斷過程並改善局部結構細節。與完全微調相比,我們的方法實現了5倍的訓練加速,僅需要額外的0.002M調整參數。大量實驗表明,我們的方法可以通過僅微調10k步驟,快速適應更高分辨率的圖像和視頻合成,幾乎不需要額外的推斷時間。
English
Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data. Adapting large pre-trained diffusion models for higher resolution demands substantial computational and optimization resources, yet achieving a generation capability comparable to low-resolution models remains elusive. This paper proposes a novel self-cascade diffusion model that leverages the rich knowledge gained from a well-trained low-resolution model for rapid adaptation to higher-resolution image and video generation, employing either tuning-free or cheap upsampler tuning paradigms. Integrating a sequence of multi-scale upsampler modules, the self-cascade diffusion model can efficiently adapt to a higher resolution, preserving the original composition and generation capabilities. We further propose a pivot-guided noise re-schedule strategy to speed up the inference process and improve local structural details. Compared to full fine-tuning, our approach achieves a 5X training speed-up and requires only an additional 0.002M tuning parameters. Extensive experiments demonstrate that our approach can quickly adapt to higher resolution image and video synthesis by fine-tuning for just 10k steps, with virtually no additional inference time.

Summary

AI-Generated Summary

PDF181December 15, 2024