ChatPaper.aiChatPaper

MagiCapture:高解析度多概念肖像定制

MagiCapture: High-Resolution Multi-Concept Portrait Customization

September 13, 2023
作者: Junha Hyung, Jaeyo Shin, Jaegul Choo
cs.AI

摘要

大規模文本到圖像模型,包括穩定擴散,能夠生成高保真度的照片逼真肖像圖像。有一個積極的研究領域致力於個性化這些模型,旨在使用提供的參考圖像集合合成特定主題或風格。然而,儘管這些個性化方法產生了合理的結果,但它們往往產生的圖像在逼真度上仍然存在不足,尚未達到商業可行水平。這在肖像圖像生成中尤為明顯,因為人臉中的任何不自然瑕疵都很容易被辨識出來,這是由於我們固有的人類偏見。為了應對這一問題,我們引入了MagiCapture,一種個性化方法,用於整合主題和風格概念,僅使用少量主題和風格參考即可生成高分辨率肖像圖像。例如,給定一些隨機自拍照,我們微調的模型可以生成具有特定風格的高質量肖像圖像,如護照照片或個人資料照片。這項任務的主要挑戰在於缺乏合成概念的真實標籤,這導致最終輸出質量降低,並且源主題的身份發生變化。為了應對這些問題,我們提出了一種新穎的注意力重新聚焦損失,結合輔助先驗知識,這兩者都有助於在這種弱監督學習環境中實現強健的學習。我們的流程還包括額外的後處理步驟,以確保創建高度逼真的輸出。MagiCapture在定量和定性評估中均優於其他基準,並且還可以推廣應用於其他非人類對象。
English
Large-scale text-to-image models including Stable Diffusion are capable of generating high-fidelity photorealistic portrait images. There is an active research area dedicated to personalizing these models, aiming to synthesize specific subjects or styles using provided sets of reference images. However, despite the plausible results from these personalization methods, they tend to produce images that often fall short of realism and are not yet on a commercially viable level. This is particularly noticeable in portrait image generation, where any unnatural artifact in human faces is easily discernible due to our inherent human bias. To address this, we introduce MagiCapture, a personalization method for integrating subject and style concepts to generate high-resolution portrait images using just a few subject and style references. For instance, given a handful of random selfies, our fine-tuned model can generate high-quality portrait images in specific styles, such as passport or profile photos. The main challenge with this task is the absence of ground truth for the composed concepts, leading to a reduction in the quality of the final output and an identity shift of the source subject. To address these issues, we present a novel Attention Refocusing loss coupled with auxiliary priors, both of which facilitate robust learning within this weakly supervised learning setting. Our pipeline also includes additional post-processing steps to ensure the creation of highly realistic outputs. MagiCapture outperforms other baselines in both quantitative and qualitative evaluations and can also be generalized to other non-human objects.
PDF273December 15, 2024