대규모 재구성 모델을 활용한 단일 뷰 3D 인간 디지털화

초록

본 논문에서는 단일 이미지로부터 인간의 신경 방사장(Neural Radiance Fields, NeRF)을 예측하기 위해 설계된 단일 단계 순방향 대형 재구성 모델인 Human-LRM을 소개한다. 우리의 접근법은 3D 스캔 및 다중 뷰 캡처를 포함한 방대한 데이터셋을 사용한 훈련에서 뛰어난 적응력을 보여준다. 또한, 특히 폐색이 있는 야외 시나리오에서 모델의 적용성을 향상시키기 위해, 조건부 삼중 평면 확산 모델을 통해 다중 뷰 재구성을 단일 뷰로 증류하는 새로운 전략을 제안한다. 이 생성적 확장은 단일 뷰에서 관찰될 때 인간 신체 형태의 고유한 변이를 해결하며, 폐색된 이미지에서도 전체 신체를 재구성할 수 있게 한다. 광범위한 실험을 통해 Human-LRM이 여러 벤치마크에서 기존 방법들을 상당한 차이로 능가함을 보여준다.

English

In this paper, we introduce Human-LRM, a single-stage feed-forward Large Reconstruction Model designed to predict human Neural Radiance Fields (NeRF) from a single image. Our approach demonstrates remarkable adaptability in training using extensive datasets containing 3D scans and multi-view capture. Furthermore, to enhance the model's applicability for in-the-wild scenarios especially with occlusions, we propose a novel strategy that distills multi-view reconstruction into single-view via a conditional triplane diffusion model. This generative extension addresses the inherent variations in human body shapes when observed from a single view, and makes it possible to reconstruct the full body human from an occluded image. Through extensive experiments, we show that Human-LRM surpasses previous methods by a significant margin on several benchmarks.

대규모 재구성 모델을 활용한 단일 뷰 3D 인간 디지털화

Single-View 3D Human Digitalization with Large Reconstruction Models

초록

Support