ChatPaper.aiChatPaper

TwinFlow:基于自对抗流实现大模型的一步生成

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

December 3, 2025
作者: Zhenglin Cheng, Peng Sun, Jianguo Li, Tao Lin
cs.AI

摘要

近期,大型多模态生成模型的突破性进展在多模态生成领域(包括图像与视频生成)展现出卓越能力。这类模型通常基于扩散模型与流匹配等多步生成框架构建,其固有机制导致推理效率受限(需40-100次函数评估)。尽管现有多种少步生成方法致力于加速推理,但均存在明显局限:主流的基于蒸馏的方法(如渐进式蒸馏与一致性蒸馏)要么需要迭代蒸馏流程,要么在极少数步骤(<4步)下出现显著性能衰退;而将对抗训练融入蒸馏过程的方法(如DMD/DMD2和SANA-Sprint)虽能提升效果,却因引入辅助训练模型导致训练不稳定、复杂度增加及GPU内存开销激增。为此,我们提出TwinFlow——一种简洁高效的单步生成模型训练框架,该方案无需依赖固定预训练教师模型,且避免使用标准对抗网络,特别适合构建大规模高效模型。在文生图任务中,本方法仅用1步推理即获得0.83的GenEval分数,超越SANA-Sprint(基于GAN损失的框架)与RCGM(基于一致性的框架)等强基线。值得注意的是,我们通过对Qwen-Image-200亿参数模型进行全参数训练,验证了TwinFlow的可扩展性,将其转化为高效少步生成器。在仅需1步推理的情况下,该方法在GenEval和DPG-Bench基准测试中与原始100步模型性能相当,以可忽略的质量损失实现100倍计算成本压缩。项目页面详见https://zhenglin-cheng.com/twinflow。
English
Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40-100 Number of Function Evaluations (NFEs)). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (< 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need of fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by 100times with minor quality degradation. Project page is available at https://zhenglin-cheng.com/twinflow.
PDF548December 9, 2025