TeCH: 텍스트 기반 생동감 있는 의상 인체 재구성

초록

단일 이미지로부터 옷을 입은 인간을 재구성하는 최근 연구 발전에도 불구하고, 높은 수준의 디테일로 "보이지 않는 영역"을 정확하게 복원하는 것은 여전히 주목받지 못한 해결되지 않은 과제로 남아 있습니다. 기존 방법들은 종종 지나치게 매끄러운 뒷면 표면과 흐릿한 텍스처를 생성합니다. 그러나 단일 이미지로부터 개인의 모든 시각적 속성을 효과적으로 포착하여 보이지 않는 영역(예: 뒷모습)을 재구성할 수 있는 방법은 무엇일까요? TeCH는 파운데이션 모델의 힘에 영감을 받아, 1) 의류 파싱 모델과 시각적 질의응답(VQA)을 통해 자동 생성된 설명적 텍스트 프롬프트(예: 의복, 색상, 헤어스타일)와 2) "설명할 수 없는" 외관을 학습하는 개인 맞춤형 텍스트-이미지 확산 모델(T2I)을 활용하여 3D 인간을 재구성합니다. 고해상도 3D 옷 입은 인간을 경제적으로 표현하기 위해, 우리는 명시적 신체 형태 그리드와 암묵적 거리 필드로 구성된 DMTet 기반의 하이브리드 3D 표현을 제안합니다. 설명적 프롬프트와 개인 맞춤형 T2I 확산 모델의 지도 하에, 3D 인간의 기하학적 구조와 텍스처는 다중 뷰 점수 증류 샘플링(SDS)과 원본 관측을 기반으로 한 재구성 손실을 통해 최적화됩니다. TeCH는 일관되고 섬세한 텍스처와 상세한 전신 기하학적 구조를 가진 고품질 3D 옷 입은 인간을 생성합니다. 정량적 및 정성적 실험은 TeCH가 재구성 정확도와 렌더링 품질 측면에서 최신 방법들을 능가함을 보여줍니다. 코드는 연구 목적으로 https://huangyangyi.github.io/tech에서 공개될 예정입니다.

English

Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality. The code will be publicly available for research purposes at https://huangyangyi.github.io/tech

TeCH: 텍스트 기반 생동감 있는 의상 인체 재구성

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

초록

Support