ChatPaper.aiChatPaper

WithAnyone:迈向可控且身份一致性的图像生成

WithAnyone: Towards Controllable and ID Consistent Image Generation

October 16, 2025
作者: Hengyuan Xu, Wei Cheng, Peng Xing, Yixiao Fang, Shuhan Wu, Rui Wang, Xianfang Zeng, Daxin Jiang, Gang Yu, Xingjun Ma, Yu-Gang Jiang
cs.AI

摘要

身份一致性生成已成为文本到图像研究的重要方向,近期模型在生成与参考身份对齐的图像方面取得了显著成功。然而,由于缺乏包含同一人物多张图像的大规模配对数据集,大多数方法不得不采用基于重建的训练方式。这种依赖往往导致我们称之为“复制粘贴”的失败模式,即模型直接复制参考面部,而非在姿态、表情或光照的自然变化中保持身份一致性。这种过度相似性削弱了可控性,限制了生成的表达能力。为解决这些局限,我们(1)构建了专为多人物场景设计的大规模配对数据集MultiID-2M,为每个身份提供多样化的参考;(2)引入了一个基准,量化复制粘贴伪影以及身份保真度与变化之间的权衡;(3)提出了一种新颖的训练范式,采用对比身份损失,利用配对数据在保真度与多样性之间取得平衡。这些成果最终汇聚于WithAnyone,一个基于扩散的模型,有效缓解了复制粘贴问题,同时保持了高身份相似性。广泛的定性和定量实验表明,WithAnyone显著减少了复制粘贴伪影,提升了对姿态和表情的可控性,并保持了强大的感知质量。用户研究进一步验证了我们的方法在实现高身份保真度的同时,支持富有表现力的可控生成。
English
Identity-consistent generation has become an important focus in text-to-image research, with recent models achieving notable success in producing images aligned with a reference identity. Yet, the scarcity of large-scale paired datasets containing multiple images of the same individual forces most approaches to adopt reconstruction-based training. This reliance often leads to a failure mode we term copy-paste, where the model directly replicates the reference face rather than preserving identity across natural variations in pose, expression, or lighting. Such over-similarity undermines controllability and limits the expressive power of generation. To address these limitations, we (1) construct a large-scale paired dataset MultiID-2M, tailored for multi-person scenarios, providing diverse references for each identity; (2) introduce a benchmark that quantifies both copy-paste artifacts and the trade-off between identity fidelity and variation; and (3) propose a novel training paradigm with a contrastive identity loss that leverages paired data to balance fidelity with diversity. These contributions culminate in WithAnyone, a diffusion-based model that effectively mitigates copy-paste while preserving high identity similarity. Extensive qualitative and quantitative experiments demonstrate that WithAnyone significantly reduces copy-paste artifacts, improves controllability over pose and expression, and maintains strong perceptual quality. User studies further validate that our method achieves high identity fidelity while enabling expressive controllable generation.
PDF763October 17, 2025