텍스트 기반 3D 얼굴 합성 — 생성부터 편집까지

초록

텍스트 기반 3D 얼굴 합성은 텍스트-이미지(T2I) 확산 모델을 활용하여 주목할 만한 성과를 달성해 왔습니다. 그러나 기존 연구 대부분은 직접적인 생성에만 초점을 맞추고 있어, 반복적인 조정을 통해 맞춤형 3D 얼굴을 합성하는 데 제한이 있습니다. 본 논문에서는 얼굴 생성부터 편집까지 통합된 텍스트 기반 프레임워크를 제안합니다. 생성 단계에서는 결합으로 인한 기하학적 세부 정보의 손실을 완화하기 위해 기하학-텍스처 분리 생성을 제안합니다. 또한, 분리를 통해 생성된 기하학을 텍스처 생성의 조건으로 활용함으로써 기하학과 텍스처가 높은 정렬성을 갖는 결과를 얻을 수 있습니다. 더 나아가, RGB 및 YUV 공간에서 텍스처 품질을 향상시키기 위해 미세 조정된 텍스처 확산 모델을 사용합니다. 편집 단계에서는 먼저 사전 학습된 확산 모델을 사용하여 텍스트를 기반으로 얼굴 기하학 또는 텍스처를 업데이트합니다. 순차적 편집을 가능하게 하기 위해 UV 도메인 일관성 보존 정규화를 도입하여 관련 없는 얼굴 속성에 의도치 않은 변경이 발생하지 않도록 합니다. 또한, 일관성을 유지하면서 편집 효율성을 향상시키기 위해 자기 주도적 일관성 가중치 전략을 제안합니다. 포괄적인 실험을 통해 우리의 방법이 얼굴 합성에서 우수함을 입증합니다. 프로젝트 페이지: https://faceg2e.github.io/.

English

Text-guided 3D face synthesis has achieved remarkable results by leveraging text-to-image (T2I) diffusion models. However, most existing works focus solely on the direct generation, ignoring the editing, restricting them from synthesizing customized 3D faces through iterative adjustments. In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to mitigate the loss of geometric details caused by coupling. Besides, decoupling enables us to utilize the generated geometry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a fine-tuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first employ a pre-trained diffusion model to update facial geometry or texture based on the texts. To enable sequential editing, we introduce a UV domain consistency preservation regularization, preventing unintentional changes to irrelevant facial attributes. Besides, we propose a self-guided consistency weight strategy to improve editing efficacy while preserving consistency. Through comprehensive experiments, we showcase our method's superiority in face synthesis. Project page: https://faceg2e.github.io/.

텍스트 기반 3D 얼굴 합성 — 생성부터 편집까지

Text-Guided 3D Face Synthesis -- From Generation to Editing

초록

Support