LightGen:通過知識蒸餾與直接偏好優化實現高效圖像生成
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization
March 11, 2025
作者: Xianfeng Wu, Yajing Bai, Haoze Zheng, Harold Haodong Chen, Yexin Liu, Zihao Wang, Xuran Ma, Wen-Jie Shu, Xianzu Wu, Harry Yang, Ser-Nam Lim
cs.AI
摘要
近期,文本到圖像生成領域的進展主要依賴於龐大的數據集和參數密集的架構。這些要求嚴重限制了缺乏充足計算資源的研究者和從業者的可及性。本文介紹了\model,一種高效的圖像生成模型訓練範式,該範式利用知識蒸餾(KD)和直接偏好優化(DPO)。受多模態大型語言模型(MLLMs)中廣泛採用的數據KD技術成功的啟發,LightGen將最先進(SOTA)的文本到圖像模型的知識蒸餾到僅有0.7B參數的緊湊掩碼自迴歸(MAR)架構中。使用僅由多樣化標題生成的2M高質量圖像的緊湊合成數據集,我們證明數據多樣性在決定模型性能方面遠超數據量。這一策略大幅降低了計算需求,並將預訓練時間從可能的上千GPU天減少至僅88GPU天。此外,為解決合成數據固有的缺陷,特別是高頻細節不足和空間不準確性,我們整合了DPO技術,以精煉圖像的保真度和位置準確性。全面的實驗證實,LightGen在顯著減少計算資源的同時,達到了與SOTA模型相當的圖像生成質量,並擴展了資源受限環境下的可及性。代碼可在https://github.com/XianfengWu01/LightGen獲取。
English
Recent advances in text-to-image generation have primarily relied on
extensive datasets and parameter-heavy architectures. These requirements
severely limit accessibility for researchers and practitioners who lack
substantial computational resources. In this paper, we introduce \model, an
efficient training paradigm for image generation models that uses knowledge
distillation (KD) and Direct Preference Optimization (DPO). Drawing inspiration
from the success of data KD techniques widely adopted in Multi-Modal Large
Language Models (MLLMs), LightGen distills knowledge from state-of-the-art
(SOTA) text-to-image models into a compact Masked Autoregressive (MAR)
architecture with only 0.7B parameters. Using a compact synthetic dataset of
just 2M high-quality images generated from varied captions, we demonstrate
that data diversity significantly outweighs data volume in determining model
performance. This strategy dramatically reduces computational demands and
reduces pre-training time from potentially thousands of GPU-days to merely 88
GPU-days. Furthermore, to address the inherent shortcomings of synthetic data,
particularly poor high-frequency details and spatial inaccuracies, we integrate
the DPO technique that refines image fidelity and positional accuracy.
Comprehensive experiments confirm that LightGen achieves image generation
quality comparable to SOTA models while significantly reducing computational
resources and expanding accessibility for resource-constrained environments.
Code is available at https://github.com/XianfengWu01/LightGenSummary
AI-Generated Summary