MUSAR:通过注意力路由机制探索基于单主体数据集的多主体定制化
MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing
May 5, 2025
作者: Zinan Guo, Pengze Zhang, Yanze Wu, Chong Mou, Songtao Zhao, Qian He
cs.AI
摘要
当前的多主体定制方法面临两大关键挑战:获取多样化的多主体训练数据的困难,以及不同主体间属性纠缠的问题。为填补这些空白,我们提出了MUSAR——一个简单却高效的框架,仅需单主体训练数据即可实现稳健的多主体定制。首先,为突破数据限制,我们引入了去偏双联学习法。该方法通过单主体图像构建双联训练对,促进多主体学习,同时借助静态注意力路由和双分支LoRA主动校正由双联构建引入的分布偏差。其次,为消除跨主体纠缠,我们引入了动态注意力路由机制,该机制自适应地建立生成图像与条件主体间的双射映射。这一设计不仅实现了多主体表征的解耦,还保持了随着参考主体增加而可扩展的泛化性能。综合实验表明,尽管仅使用单主体数据集,我们的MUSAR在图像质量、主体一致性和交互自然度上均优于现有方法,甚至包括那些在多主体数据集上训练的方法。
English
Current multi-subject customization approaches encounter two critical
challenges: the difficulty in acquiring diverse multi-subject training data,
and attribute entanglement across different subjects. To bridge these gaps, we
propose MUSAR - a simple yet effective framework to achieve robust
multi-subject customization while requiring only single-subject training data.
Firstly, to break the data limitation, we introduce debiased diptych learning.
It constructs diptych training pairs from single-subject images to facilitate
multi-subject learning, while actively correcting the distribution bias
introduced by diptych construction via static attention routing and dual-branch
LoRA. Secondly, to eliminate cross-subject entanglement, we introduce dynamic
attention routing mechanism, which adaptively establishes bijective mappings
between generated images and conditional subjects. This design not only
achieves decoupling of multi-subject representations but also maintains
scalable generalization performance with increasing reference subjects.
Comprehensive experiments demonstrate that our MUSAR outperforms existing
methods - even those trained on multi-subject dataset - in image quality,
subject consistency, and interaction naturalness, despite requiring only
single-subject dataset.Summary
AI-Generated Summary