ChatPaper.aiChatPaper

无限重塑:在保持身份特征的同时灵活重构照片

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

March 20, 2025
作者: Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin Lu
cs.AI

摘要

实现灵活且高保真度的身份保持图像生成仍具挑战性,尤其是在面对如FLUX等先进的扩散变换器(DiTs)时。我们提出了InfiniteYou(InfU),这是最早利用DiTs完成此任务的稳健框架之一。InfU针对现有方法的显著问题,如身份相似度不足、文本-图像对齐不佳以及生成质量和美学效果低下,提供了解决方案。InfU的核心是InfuseNet,该组件通过残差连接将身份特征注入DiT基础模型,在保持生成能力的同时增强了身份相似性。采用多阶段训练策略,包括预训练和利用合成的单人多样本(SPMS)数据进行监督微调(SFT),进一步提升了文本-图像对齐,改善了图像质量,并缓解了面部复制粘贴现象。大量实验证明,InfU实现了最先进的性能,超越了现有基线。此外,InfU的即插即用设计确保了与多种现有方法的兼容性,为更广泛的社区提供了宝贵的贡献。
English
Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.

Summary

AI-Generated Summary

PDF356March 21, 2025