울트라바타: 진실성 가이드 텍스처를 갖춘 사실적 애니메이션 가능 3D 아바타 확산 모델

초록

최근 3D 아바타 생성 기술의 발전이 큰 주목을 받고 있다. 이러한 혁신은 더욱 현실적이고 애니메이션 가능한 아바타를 생성하여 가상과 현실 세계 간의 격차를 줄이는 것을 목표로 한다. 기존 연구의 대부분은 Score Distillation Sampling(SDS) 손실 함수를 사용하며, 이는 미분 가능한 렌더러와 텍스트 조건을 결합하여 확산 모델이 3D 아바타를 생성하도록 유도한다. 그러나 SDS는 종종 과도하게 평활화된 결과를 생성하여 얼굴 세부 사항이 부족하고, 조상 샘플링(ancestral sampling)에 비해 다양성이 떨어진다. 반면, 단일 이미지에서 3D 아바타를 생성하는 다른 연구들은 원치 않는 조명 효과, 원근법 뷰, 그리고 낮은 이미지 품질로 인해 정렬된 완전한 텍스처를 가진 3D 얼굴 메쉬를 안정적으로 재구성하기 어렵다는 문제가 있다. 본 논문에서는 기하학적 충실도가 향상되고, 원치 않는 조명 없이 물리 기반 렌더링(PBR) 텍스처의 우수한 품질을 갖춘 새로운 3D 아바타 생성 접근법인 UltrAvatar를 제안한다. 이를 위해, 제안된 접근법은 확산 색상 추출 모델과 진실성 가이드 텍스처 확산 모델을 제시한다. 전자는 원치 않는 조명 효과를 제거하여 실제 확산 색상을 드러내어 생성된 아바타가 다양한 조명 조건에서 렌더링될 수 있도록 한다. 후자는 PBR 텍스처를 생성하기 위해 두 가지 그래디언트 기반 가이던스를 따르며, 다양한 얼굴 정체성 특징과 세부 사항을 렌더링하고 3D 메쉬 기하학과 더 잘 정렬되도록 한다. 실험을 통해 제안된 방법의 효과와 견고성을 입증하며, 최신 기술을 큰 차이로 능가하는 성능을 보여준다.

English

Recent advances in 3D avatar generation have gained significant attentions. These breakthroughs aim to produce more realistic animatable avatars, narrowing the gap between virtual and real-world experiences. Most of existing works employ Score Distillation Sampling (SDS) loss, combined with a differentiable renderer and text condition, to guide a diffusion model in generating 3D avatars. However, SDS often generates oversmoothed results with few facial details, thereby lacking the diversity compared with ancestral sampling. On the other hand, other works generate 3D avatar from a single image, where the challenges of unwanted lighting effects, perspective views, and inferior image quality make them difficult to reliably reconstruct the 3D face meshes with the aligned complete textures. In this paper, we propose a novel 3D avatar generation approach termed UltrAvatar with enhanced fidelity of geometry, and superior quality of physically based rendering (PBR) textures without unwanted lighting. To this end, the proposed approach presents a diffuse color extraction model and an authenticity guided texture diffusion model. The former removes the unwanted lighting effects to reveal true diffuse colors so that the generated avatars can be rendered under various lighting conditions. The latter follows two gradient-based guidances for generating PBR textures to render diverse face-identity features and details better aligning with 3D mesh geometry. We demonstrate the effectiveness and robustness of the proposed method, outperforming the state-of-the-art methods by a large margin in the experiments.

울트라바타: 진실성 가이드 텍스처를 갖춘 사실적 애니메이션 가능 3D 아바타 확산 모델

UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures

초록

Support