Real3D:利用真實世界影像擴展大型重建模型
Real3D: Scaling Up Large Reconstruction Models with Real-World Images
June 12, 2024
作者: Hanwen Jiang, Qixing Huang, Georgios Pavlakos
cs.AI
摘要
訓練單視角大型重建模型(LRMs)的默認策略是採用完全監督的方式,使用大規模合成3D資產或多視角捕獲的數據集。儘管這些資源簡化了訓練過程,但很難擴展到現有數據集以外,並且不一定代表物體形狀的真實分佈。為了解決這些限制,在本文中,我們介紹了Real3D,這是第一個可以使用單視角真實世界圖像進行訓練的LRM系統。Real3D引入了一個新穎的自我訓練框架,可以從現有的合成數據和多樣的單視角真實圖像中受益。我們提出了兩種無監督損失,使我們能夠在像素級和語義級監督LRMs,即使對於沒有地面真實3D或新視角的訓練示例也是如此。為了進一步提高性能並擴展圖像數據,我們開發了一種自動數據整理方法,從野外圖像中收集高質量示例。我們的實驗表明,Real3D在包括真實和合成數據以及領域內外形狀在內的四種不同評估設置中始終優於先前的工作。代碼和模型可在此處找到:https://hwjiang1510.github.io/Real3D/
English
The default strategy for training single-view Large Reconstruction Models
(LRMs) follows the fully supervised route using large-scale datasets of
synthetic 3D assets or multi-view captures. Although these resources simplify
the training procedure, they are hard to scale up beyond the existing datasets
and they are not necessarily representative of the real distribution of object
shapes. To address these limitations, in this paper, we introduce Real3D, the
first LRM system that can be trained using single-view real-world images.
Real3D introduces a novel self-training framework that can benefit from both
the existing synthetic data and diverse single-view real images. We propose two
unsupervised losses that allow us to supervise LRMs at the pixel- and
semantic-level, even for training examples without ground-truth 3D or novel
views. To further improve performance and scale up the image data, we develop
an automatic data curation approach to collect high-quality examples from
in-the-wild images. Our experiments show that Real3D consistently outperforms
prior work in four diverse evaluation settings that include real and synthetic
data, as well as both in-domain and out-of-domain shapes. Code and model can be
found here: https://hwjiang1510.github.io/Real3D/Summary
AI-Generated Summary