LRM: 단일 이미지에서 3D로의 대규모 재구성 모델

초록

단일 입력 이미지로부터 단 5초 만에 물체의 3D 모델을 예측하는 최초의 대형 재구성 모델(Large Reconstruction Model, LRM)을 제안합니다. ShapeNet과 같은 소규모 데이터셋에서 범주별로 학습된 기존 방법들과 달리, LRM은 5억 개의 학습 가능한 매개변수를 가진 고도로 확장 가능한 트랜스포머 기반 아키텍처를 채택하여 입력 이미지로부터 신경 방사장(Neural Radiance Field, NeRF)을 직접 예측합니다. 우리는 Objaverse의 합성 렌더링과 MVImgNet의 실제 캡처를 포함하여 약 100만 개의 물체로 구성된 대규모 다중 뷰 데이터를 통해 이 모델을 종단 간(end-to-end) 방식으로 학습시켰습니다. 이러한 고용량 모델과 대규모 학습 데이터의 조합은 우리 모델이 실제 환경에서 캡처된 이미지와 생성 모델에서 나온 이미지를 포함한 다양한 테스트 입력에서도 높은 일반화 능력과 고품질의 3D 재구성을 가능하게 합니다. 비디오 데모와 상호작용 가능한 3D 메시는 다음 웹사이트에서 확인할 수 있습니다: https://yiconghong.me/LRM/.

English

We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. In contrast to many previous methods that are trained on small-scale datasets such as ShapeNet in a category-specific fashion, LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image. We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects, including both synthetic renderings from Objaverse and real captures from MVImgNet. This combination of a high-capacity model and large-scale training data empowers our model to be highly generalizable and produce high-quality 3D reconstructions from various testing inputs including real-world in-the-wild captures and images from generative models. Video demos and interactable 3D meshes can be found on this website: https://yiconghong.me/LRM/.

LRM: 단일 이미지에서 3D로의 대규모 재구성 모델

LRM: Large Reconstruction Model for Single Image to 3D

초록

Support