文本引導的3D人臉合成--從生成到編輯

摘要

透過利用文本到圖像（T2I）擴散模型，文本引導的3D面部合成已經取得了顯著的成果。然而，大多數現有的研究僅專注於直接生成，忽略了編輯，限制了它們無法通過迭代調整來合成定制的3D面部。在本文中，我們提出了一個從面部生成到編輯的統一文本引導框架。在生成階段，我們提出了一種幾何-紋理解耦生成，以減輕由耦合引起的幾何細節損失。此外，解耦使我們能夠將生成的幾何用作紋理生成的條件，產生高度一致的幾何-紋理結果。我們進一步採用了一個經過微調的紋理擴散模型，以提高RGB和YUV空間中的紋理質量。在編輯階段，我們首先使用預先訓練的擴散模型根據文本更新面部幾何或紋理。為了實現順序編輯，我們引入了一個UV域一致性保留正則化，防止對無關面部特徵的意外更改。此外，我們提出了一種自導一致性權重策略，以提高編輯效率同時保持一致性。通過全面的實驗，我們展示了我們的方法在面部合成中的優越性。項目頁面：https://faceg2e.github.io/.

English

Text-guided 3D face synthesis has achieved remarkable results by leveraging text-to-image (T2I) diffusion models. However, most existing works focus solely on the direct generation, ignoring the editing, restricting them from synthesizing customized 3D faces through iterative adjustments. In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to mitigate the loss of geometric details caused by coupling. Besides, decoupling enables us to utilize the generated geometry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a fine-tuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first employ a pre-trained diffusion model to update facial geometry or texture based on the texts. To enable sequential editing, we introduce a UV domain consistency preservation regularization, preventing unintentional changes to irrelevant facial attributes. Besides, we propose a self-guided consistency weight strategy to improve editing efficacy while preserving consistency. Through comprehensive experiments, we showcase our method's superiority in face synthesis. Project page: https://faceg2e.github.io/.

文本引導的3D人臉合成--從生成到編輯

Text-Guided 3D Face Synthesis -- From Generation to Editing

摘要

Support