TAG-MoE：面向统一生成式专家混合模型的任务感知门控机制

摘要

统一图像生成与编辑模型在稠密扩散变换器架构中面临严重的任务干扰问题，共享参数空间必须在相互冲突的目标（如局部编辑与主体驱动生成）之间做出妥协。虽然稀疏混合专家范式是颇具前景的解决方案，但其门控网络仍保持任务无关性，仅基于局部特征运作而无法感知全局任务意图。这种任务无关特性阻碍了有意义的专业化分工，未能从根本上解决任务干扰问题。本文提出一种将语义意图注入MoE路由的新框架：通过分层任务语义标注方案构建结构化任务描述符（如作用范围、任务类型、内容保留要求），并设计预测对齐正则化机制使内部路由决策与高层任务语义对齐。该正则化使门控网络从任务无关执行器演进为智能调度中心。实验表明，我们的模型有效缓解了任务干扰，在保真度与生成质量上超越稠密基线模型，分析结果证实专家网络自发形成了清晰且语义关联的专业化分工。

English

Unified image generation and editing models suffer from severe task interference in dense diffusion transformers architectures, where a shared parameter space must compromise between conflicting objectives (e.g., local editing v.s. subject-driven generation). While the sparse Mixture-of-Experts (MoE) paradigm is a promising solution, its gating networks remain task-agnostic, operating based on local features, unaware of global task intent. This task-agnostic nature prevents meaningful specialization and fails to resolve the underlying task interference. In this paper, we propose a novel framework to inject semantic intent into MoE routing. We introduce a Hierarchical Task Semantic Annotation scheme to create structured task descriptors (e.g., scope, type, preservation). We then design Predictive Alignment Regularization to align internal routing decisions with the task's high-level semantics. This regularization evolves the gating network from a task-agnostic executor to a dispatch center. Our model effectively mitigates task interference, outperforming dense baselines in fidelity and quality, and our analysis shows that experts naturally develop clear and semantically correlated specializations.

TAG-MoE：面向统一生成式专家混合模型的任务感知门控机制

TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

摘要

Support