GALA: 단일 스캔에서 애니메이션 가능한 계층적 에셋 생성

초록

우리는 단일 층의 옷을 입은 3D 인간 메시를 입력으로 받아 완전한 다층 3D 자산으로 분해하는 프레임워크인 GALA를 소개합니다. 이 출력물은 다른 자산과 결합되어 어떤 포즈든 새로운 옷을 입은 인간 아바타를 생성할 수 있습니다. 기존의 재구성 접근법은 종종 옷을 입은 인간을 단일 층의 기하학으로 취급하고, 헤어스타일, 의상, 액세서리와 같은 인간의 내재적 구성성을 간과하여, 메시의 다운스트림 애플리케이션에서의 유용성을 제한합니다. 단일 층 메시를 별도의 층으로 분해하는 것은 심각하게 가려진 영역에 대한 그럴듯한 기하학과 텍스처를 합성해야 하기 때문에 어려운 작업입니다. 더욱이, 성공적인 분해가 이루어졌더라도, 메시는 포즈와 신체 형태 측면에서 정규화되지 않아 새로운 신원과 포즈와의 일관된 구성에 실패합니다. 이러한 문제를 해결하기 위해, 우리는 사전 훈련된 2D 확산 모델의 일반 지식을 인간 및 기타 자산에 대한 기하학 및 외관 사전 지식으로 활용할 것을 제안합니다. 먼저, 다중 뷰 2D 세분화에서 추출한 3D 표면 세분화를 사용하여 입력 메시를 분리합니다. 그런 다음, 새로운 포즈 가이드 스코어 증류 샘플링(SDS) 손실을 사용하여 포즈된 공간과 정규 공간에서 다양한 층의 누락된 기하학을 합성합니다. 고화질 3D 기하학의 인페인팅이 완료되면, 동일한 SDS 손실을 텍스처에 적용하여 초기에 가려진 영역을 포함한 완전한 외관을 얻습니다. 일련의 분해 단계를 통해, 포즈와 인간 형태 측면에서 정규화된 공유 정규 공간에서 다층 3D 자산을 얻으며, 이는 새로운 신원과 포즈와의 쉬운 구성 및 재생성을 지원합니다. 우리의 실험은 기존 솔루션과 비교하여 분해, 정규화, 구성 작업에 대한 우리의 접근법의 효과를 입증합니다.

English

We present GALA, a framework that takes as input a single-layer clothed 3D human mesh and decomposes it into complete multi-layered 3D assets. The outputs can then be combined with other assets to create novel clothed human avatars with any pose. Existing reconstruction approaches often treat clothed humans as a single-layer of geometry and overlook the inherent compositionality of humans with hairstyles, clothing, and accessories, thereby limiting the utility of the meshes for downstream applications. Decomposing a single-layer mesh into separate layers is a challenging task because it requires the synthesis of plausible geometry and texture for the severely occluded regions. Moreover, even with successful decomposition, meshes are not normalized in terms of poses and body shapes, failing coherent composition with novel identities and poses. To address these challenges, we propose to leverage the general knowledge of a pretrained 2D diffusion model as geometry and appearance prior for humans and other assets. We first separate the input mesh using the 3D surface segmentation extracted from multi-view 2D segmentations. Then we synthesize the missing geometry of different layers in both posed and canonical spaces using a novel pose-guided Score Distillation Sampling (SDS) loss. Once we complete inpainting high-fidelity 3D geometry, we also apply the same SDS loss to its texture to obtain the complete appearance including the initially occluded regions. Through a series of decomposition steps, we obtain multiple layers of 3D assets in a shared canonical space normalized in terms of poses and human shapes, hence supporting effortless composition to novel identities and reanimation with novel poses. Our experiments demonstrate the effectiveness of our approach for decomposition, canonicalization, and composition tasks compared to existing solutions.

GALA: 단일 스캔에서 애니메이션 가능한 계층적 에셋 생성

GALA: Generating Animatable Layered Assets from a Single Scan

초록

Support