Real3D: 실세계 이미지를 활용한 대규모 재구성 모델의 확장

초록

단일 뷰 대형 재구성 모델(Large Reconstruction Models, LRMs)을 훈련하기 위한 기본 전략은 대규모 합성 3D 자산 데이터셋이나 다중 뷰 캡처 데이터를 사용한 완전 지도 학습을 따릅니다. 이러한 자원들은 훈련 과정을 단순화하지만, 기존 데이터셋을 넘어 확장하기 어렵고 실제 물체 형태 분포를 반드시 대표하지는 않습니다. 이러한 한계를 해결하기 위해, 본 논문에서는 단일 뷰 실세계 이미지를 사용하여 훈련할 수 있는 최초의 LRM 시스템인 Real3D를 소개합니다. Real3D는 기존의 합성 데이터와 다양한 단일 뷰 실세계 이미지 모두를 활용할 수 있는 새로운 자기 훈련 프레임워크를 도입합니다. 우리는 픽셀 수준과 의미 수준에서 LRM을 지도할 수 있는 두 가지 비지도 손실 함수를 제안하며, 이는 3D 그라운드 트루나 새로운 뷰가 없는 훈련 예제에서도 적용 가능합니다. 성능을 더욱 개선하고 이미지 데이터를 확장하기 위해, 우리는 야생 이미지에서 고품질 예제를 수집하는 자동 데이터 큐레이션 접근법을 개발했습니다. 실험 결과, Real3D는 실세계 및 합성 데이터를 포함한 네 가지 다양한 평가 설정에서 이전 연구를 꾸준히 능가하는 성능을 보여줍니다. 여기에는 인-도메인 및 아웃-오브-도메인 형태 모두가 포함됩니다. 코드와 모델은 다음 링크에서 확인할 수 있습니다: https://hwjiang1510.github.io/Real3D/

English

The default strategy for training single-view Large Reconstruction Models (LRMs) follows the fully supervised route using large-scale datasets of synthetic 3D assets or multi-view captures. Although these resources simplify the training procedure, they are hard to scale up beyond the existing datasets and they are not necessarily representative of the real distribution of object shapes. To address these limitations, in this paper, we introduce Real3D, the first LRM system that can be trained using single-view real-world images. Real3D introduces a novel self-training framework that can benefit from both the existing synthetic data and diverse single-view real images. We propose two unsupervised losses that allow us to supervise LRMs at the pixel- and semantic-level, even for training examples without ground-truth 3D or novel views. To further improve performance and scale up the image data, we develop an automatic data curation approach to collect high-quality examples from in-the-wild images. Our experiments show that Real3D consistently outperforms prior work in four diverse evaluation settings that include real and synthetic data, as well as both in-domain and out-of-domain shapes. Code and model can be found here: https://hwjiang1510.github.io/Real3D/

Real3D: 실세계 이미지를 활용한 대규모 재구성 모델의 확장

Real3D: Scaling Up Large Reconstruction Models with Real-World Images

초록

Support