文本引导的3D人脸合成--从生成到编辑

摘要

基于文本的3D人脸合成通过利用文本到图像（T2I）扩散模型取得了显著成果。然而，大多数现有作品仅关注直接生成，忽略了编辑，限制了它们通过迭代调整合成定制的3D人脸。本文提出了一个从人脸生成到编辑的统一文本引导框架。在生成阶段，我们提出了一种几何-纹理解耦生成，以减轻由耦合引起的几何细节丢失。此外，解耦使我们能够利用生成的几何作为纹理生成的条件，产生高度几何-纹理对齐的结果。我们进一步采用了一个经过微调的纹理扩散模型，以提高RGB和YUV空间中的纹理质量。在编辑阶段，我们首先利用一个预训练的扩散模型根据文本更新面部几何或纹理。为了实现顺序编辑，我们引入了一个UV域一致性保持正则化，防止对无关面部属性的意外更改。此外，我们提出了一种自引导一致性权重策略，以提高编辑效率同时保持一致性。通过全面实验，我们展示了我们的方法在人脸合成中的优越性。项目页面：https://faceg2e.github.io/。

English

Text-guided 3D face synthesis has achieved remarkable results by leveraging text-to-image (T2I) diffusion models. However, most existing works focus solely on the direct generation, ignoring the editing, restricting them from synthesizing customized 3D faces through iterative adjustments. In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to mitigate the loss of geometric details caused by coupling. Besides, decoupling enables us to utilize the generated geometry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a fine-tuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first employ a pre-trained diffusion model to update facial geometry or texture based on the texts. To enable sequential editing, we introduce a UV domain consistency preservation regularization, preventing unintentional changes to irrelevant facial attributes. Besides, we propose a self-guided consistency weight strategy to improve editing efficacy while preserving consistency. Through comprehensive experiments, we showcase our method's superiority in face synthesis. Project page: https://faceg2e.github.io/.

文本引导的3D人脸合成--从生成到编辑

Text-Guided 3D Face Synthesis -- From Generation to Editing

摘要

Support