Qwen-Image-Flash：超越客观设计

摘要

少步蒸馏已成为加速先进视觉生成模型的有效策略，然而先前的研究主要聚焦于蒸馏目标。本文从互补视角重新审视少步蒸馏，重点关注关键影响学生模型性能的训练方案。以Qwen-Image-2.0为典型案例，我们系统研究了统一文本到图像生成与指令引导的图像编辑蒸馏中的三个因素：数据组成、教师指导以及任务混合。我们的实证分析揭示了若干非直观行为，这推动了Qwen-Image-Flash的发展。总体而言，我们的结果表明，有效的少步蒸馏不仅需要精心设计的目标，还需要对更广泛的训练流程进行原则性的组织。

English

Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.