ChatPaper.aiChatPaper

InfiniteYou:在保持身份特徵的同時實現靈活的照片重塑

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

March 20, 2025
作者: Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin Lu
cs.AI

摘要

實現靈活且高保真度的身份保持圖像生成仍然是一項艱巨的挑戰,尤其是在使用如FLUX等先進的擴散變換器(DiTs)時。我們引入了InfiniteYou(InfU),這是最早利用DiTs來完成此任務的強大框架之一。InfU解決了現有方法中的一些重大問題,例如身份相似度不足、文本與圖像對齊不佳以及生成質量和美學效果低下。InfU的核心是InfuseNet,這是一個通過殘差連接將身份特徵注入DiT基礎模型的組件,從而增強身份相似度,同時保持生成能力。多階段訓練策略,包括使用合成的單人多樣本(SPMS)數據進行預訓練和有監督微調(SFT),進一步改善了文本與圖像的對齊,提升了圖像質量,並減少了臉部複製粘貼的問題。大量實驗表明,InfU達到了最先進的性能,超越了現有的基線方法。此外,InfU的即插即用設計確保了與各種現有方法的兼容性,為更廣泛的社區提供了寶貴的貢獻。
English
Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.

Summary

AI-Generated Summary

PDF356March 21, 2025