ChatPaper.aiChatPaper

TwinFlow:基於自對抗流實現大型模型的一步生成

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

December 3, 2025
作者: Zhenglin Cheng, Peng Sun, Jianguo Li, Tao Lin
cs.AI

摘要

近期大型多模態生成模型的突破,在多模態生成(包括圖像與影片生成)領域展現出卓越能力。這類模型通常基於擴散模型與流匹配等多步驟框架構建,其固有機制限制了推理效率(需40-100次函數評估)。雖然現有多種少步數方法試圖加速推理,但現有方案存在明顯局限:主流的基於蒸餾的方法(如漸進式蒸餾與一致性蒸餾)需迭代蒸餾流程,或在極少步數(<4步)下出現明顯性能衰退;而將對抗訓練融入蒸餾的方法(如DMD/DMD2與SANA-Sprint)雖能提升性能,卻因需訓練輔助模型導致訓練不穩定、複雜度增加及GPU記憶體開銷龐大。為此,我們提出TwinFlow——一種簡潔有效的單步生成模型訓練框架,無需依賴預訓練的固定教師模型,並在訓練過程中避免使用標準對抗網絡,特別適合構建大規模高效模型。在文字生成圖像任務中,本方法以單步推理達成0.83的GenEval分數,優於SANA-Sprint(基於GAN損失的框架)與RCGM(基於一致性的框架)等強基線。值得注意的是,我們通過對Qwen-Image-20B進行全參數訓練驗證了TwinFlow的可擴展性,將其轉化為高效少步生成器。僅需單步推理,本方法在GenEval與DPG-Bench基準上即可匹配原始100步模型的性能,在品質損失微乎其微的前提下將計算成本降低百倍。項目頁面請見:https://zhenglin-cheng.com/twinflow。
English
Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40-100 Number of Function Evaluations (NFEs)). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (< 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need of fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by 100times with minor quality degradation. Project page is available at https://zhenglin-cheng.com/twinflow.
PDF548December 9, 2025