MagiCapture:高分辨率多概念肖像定制
MagiCapture: High-Resolution Multi-Concept Portrait Customization
September 13, 2023
作者: Junha Hyung, Jaeyo Shin, Jaegul Choo
cs.AI
摘要
大规模文本到图像模型,包括稳定扩散,能够生成高保真度的逼真肖像图像。有一个专门研究个性化这些模型的活跃领域,旨在使用提供的参考图像集合合成特定主题或风格。然而,尽管这些个性化方法产生了可信的结果,但它们往往生成的图像缺乏逼真度,尚未达到商业可行水平。这在肖像图像生成中尤为明显,因为人脸中的任何不自然瑕疵都很容易被察觉,这是由于我们内在的人类偏见。为了解决这个问题,我们引入了MagiCapture,一种个性化方法,用于整合主题和风格概念,仅使用少量主题和风格参考图像生成高分辨率肖像图像。例如,给定一些随机自拍照,我们经过微调的模型可以生成特定风格的高质量肖像图像,如护照照片或侧面照。这个任务的主要挑战在于合成概念缺乏真实标准,导致最终输出质量降低,并且源主题的身份发生变化。为了解决这些问题,我们提出了一种新颖的关注重新聚焦损失,结合辅助先验,两者都有助于在这种弱监督学习环境中进行稳健学习。我们的流程还包括额外的后处理步骤,以确保生成高度逼真的输出。MagiCapture在定量和定性评估中均优于其他基线,并且也可以推广到其他非人类对象。
English
Large-scale text-to-image models including Stable Diffusion are capable of
generating high-fidelity photorealistic portrait images. There is an active
research area dedicated to personalizing these models, aiming to synthesize
specific subjects or styles using provided sets of reference images. However,
despite the plausible results from these personalization methods, they tend to
produce images that often fall short of realism and are not yet on a
commercially viable level. This is particularly noticeable in portrait image
generation, where any unnatural artifact in human faces is easily discernible
due to our inherent human bias. To address this, we introduce MagiCapture, a
personalization method for integrating subject and style concepts to generate
high-resolution portrait images using just a few subject and style references.
For instance, given a handful of random selfies, our fine-tuned model can
generate high-quality portrait images in specific styles, such as passport or
profile photos. The main challenge with this task is the absence of ground
truth for the composed concepts, leading to a reduction in the quality of the
final output and an identity shift of the source subject. To address these
issues, we present a novel Attention Refocusing loss coupled with auxiliary
priors, both of which facilitate robust learning within this weakly supervised
learning setting. Our pipeline also includes additional post-processing steps
to ensure the creation of highly realistic outputs. MagiCapture outperforms
other baselines in both quantitative and qualitative evaluations and can also
be generalized to other non-human objects.