DanceOPD: 同策略生成式场蒸馏

摘要

现代图像生成要求单个模型能够统一多种能力，包括文生图、局部编辑和全局编辑。然而，这些能力往往难以自然对齐，甚至相互冲突。例如，编辑操作会降低文生图性能，而全局编辑与局部编辑之间也会相互干扰。因此，如何有效整合这些能力已成为图像生成模型训练的核心挑战。为解决这一问题，我们提出DanceOPD——一种面向流匹配模型的在线生成场蒸馏框架。该框架将每个样本路由至特定能力场，查询一个低噪声的学生生成状态，并通过简单的速度均方误差目标进行训练。当每个能力源被定义为共享流状态空间上的速度场时，学生模型通过查询其自身滚动状态下的场来学习整合专家能力。该框架还能吸收算子定义场，如无分类器引导。在文生图、编辑、真实性场吸收及CFG吸收任务上的全面实验表明，我们的方法改进了多能力整合效果，在增强目标能力的同时保持锚定生成质量。我们相信这项工作为流匹配模型中的生成场蒸馏建立了一条实用路径。

English

Modern image generation demands a single model that unifies diverse capabilities, including text-to-image (T2I), local editing, and global editing. However, these capabilities are rarely naturally aligned and often conflict. For instance, editing tends to degrade T2I performance, while global and local editing interfere with each other. Consequently, effectively composing these capabilities has become a central challenge for image generation model training. To tackle this, we introduce DanceOPD, an on-policy generative field distillation framework for flow-matching models that routes each sample to one capability field, queries one low-noise student-induced state, and trains with a simple velocity MSE objective. With each capability source defined as a velocity field over the shared flow state space, the student learns from fields queried on its own rollout states to compose expert capabilities. This formulation also absorbs operator-defined fields such as classifier-free guidance. Comprehensive experiments on T2I, editing, realism-field absorption, and CFG absorption show that our approach improves multi-capability composition, strengthening target capabilities while preserving anchor generation quality. We believe this work establishes a practical route for generative field distillation in flow-matching models.