ChatPaper.aiChatPaper

Real3D:利用真實世界影像擴展大型重建模型

Real3D: Scaling Up Large Reconstruction Models with Real-World Images

June 12, 2024
作者: Hanwen Jiang, Qixing Huang, Georgios Pavlakos
cs.AI

摘要

訓練單視角大型重建模型(LRMs)的默認策略是採用完全監督的方式,使用大規模合成3D資產或多視角捕獲的數據集。儘管這些資源簡化了訓練過程,但很難擴展到現有數據集以外,並且不一定代表物體形狀的真實分佈。為了解決這些限制,在本文中,我們介紹了Real3D,這是第一個可以使用單視角真實世界圖像進行訓練的LRM系統。Real3D引入了一個新穎的自我訓練框架,可以從現有的合成數據和多樣的單視角真實圖像中受益。我們提出了兩種無監督損失,使我們能夠在像素級和語義級監督LRMs,即使對於沒有地面真實3D或新視角的訓練示例也是如此。為了進一步提高性能並擴展圖像數據,我們開發了一種自動數據整理方法,從野外圖像中收集高質量示例。我們的實驗表明,Real3D在包括真實和合成數據以及領域內外形狀在內的四種不同評估設置中始終優於先前的工作。代碼和模型可在此處找到:https://hwjiang1510.github.io/Real3D/
English
The default strategy for training single-view Large Reconstruction Models (LRMs) follows the fully supervised route using large-scale datasets of synthetic 3D assets or multi-view captures. Although these resources simplify the training procedure, they are hard to scale up beyond the existing datasets and they are not necessarily representative of the real distribution of object shapes. To address these limitations, in this paper, we introduce Real3D, the first LRM system that can be trained using single-view real-world images. Real3D introduces a novel self-training framework that can benefit from both the existing synthetic data and diverse single-view real images. We propose two unsupervised losses that allow us to supervise LRMs at the pixel- and semantic-level, even for training examples without ground-truth 3D or novel views. To further improve performance and scale up the image data, we develop an automatic data curation approach to collect high-quality examples from in-the-wild images. Our experiments show that Real3D consistently outperforms prior work in four diverse evaluation settings that include real and synthetic data, as well as both in-domain and out-of-domain shapes. Code and model can be found here: https://hwjiang1510.github.io/Real3D/

Summary

AI-Generated Summary

PDF71December 6, 2024