HeadSculpt: 텍스트를 활용한 3D 헤드 아바타 제작

초록

최근 텍스트 기반 3D 생성 방법은 대규모 시각-언어 및 이미지 확산 모델의 확산을 활용하여 고품질 텍스처와 형상을 생성하는 데 있어 놀라운 발전을 이루었습니다. 그러나 기존 방법들은 여전히 고충실도 3D 헤드 아바타를 생성하는 데 있어 두 가지 측면에서 어려움을 겪고 있습니다: (1) 대부분 사전 학습된 텍스트-이미지 확산 모델에 의존하면서 필요한 3D 인식과 헤드 사전 지식이 부족합니다. 이로 인해 생성된 아바타에서 불일치와 기하학적 왜곡이 발생하기 쉽습니다. (2) 세밀한 편집 기능이 부족합니다. 이는 주로 사전 학습된 2D 이미지 확산 모델에서 상속된 한계 때문이며, 이러한 한계는 3D 헤드 아바타에 이르면 더욱 두드러집니다. 본 연구에서는 이러한 문제를 해결하기 위해 텍스트 프롬프트로부터 3D 헤드 아바타를 제작(즉, 생성 및 편집)하기 위한 다용도 코스-투-파인 파이프라인인 HeadSculpt를 소개합니다. 구체적으로, 먼저 랜드마크 기반 제어와 헤드의 후면 외관을 나타내는 학습된 텍스트 임베딩을 활용하여 확산 모델에 3D 인식을 부여함으로써 3D 일관성 있는 헤드 아바타 생성을 가능하게 합니다. 또한, 고해상도 미분 가능 렌더링 기술을 통해 텍스처 메쉬를 최적화하기 위한 새로운 아이덴티티 인식 편집 점수 증류 전략을 제안합니다. 이를 통해 편집 지시를 따르면서도 아이덴티티를 보존할 수 있습니다. 우리는 포괄적인 실험과 기존 방법과의 비교를 통해 HeadSculpt의 우수한 충실도와 편집 기능을 입증합니다.

English

Recently, text-guided 3D generative methods have made remarkable advancements in producing high-quality textures and geometry, capitalizing on the proliferation of large vision-language and image diffusion models. However, existing methods still struggle to create high-fidelity 3D head avatars in two aspects: (1) They rely mostly on a pre-trained text-to-image diffusion model whilst missing the necessary 3D awareness and head priors. This makes them prone to inconsistency and geometric distortions in the generated avatars. (2) They fall short in fine-grained editing. This is primarily due to the inherited limitations from the pre-trained 2D image diffusion models, which become more pronounced when it comes to 3D head avatars. In this work, we address these challenges by introducing a versatile coarse-to-fine pipeline dubbed HeadSculpt for crafting (i.e., generating and editing) 3D head avatars from textual prompts. Specifically, we first equip the diffusion model with 3D awareness by leveraging landmark-based control and a learned textual embedding representing the back view appearance of heads, enabling 3D-consistent head avatar generations. We further propose a novel identity-aware editing score distillation strategy to optimize a textured mesh with a high-resolution differentiable rendering technique. This enables identity preservation while following the editing instruction. We showcase HeadSculpt's superior fidelity and editing capabilities through comprehensive experiments and comparisons with existing methods.

HeadSculpt: 텍스트를 활용한 3D 헤드 아바타 제작

HeadSculpt: Crafting 3D Head Avatars with Text

초록

Support