Real3D:利用真实世界图像扩展大型重建模型
Real3D: Scaling Up Large Reconstruction Models with Real-World Images
June 12, 2024
作者: Hanwen Jiang, Qixing Huang, Georgios Pavlakos
cs.AI
摘要
训练单视角大型重建模型(LRMs)的默认策略是遵循完全监督的路径,使用大规模合成3D资产或多视角捕获的数据集。尽管这些资源简化了训练过程,但很难将其扩展到现有数据集之外,并且它们不一定代表物体形状的真实分布。为了解决这些限制,在本文中,我们介绍了Real3D,这是第一个可以使用单视角真实世界图像进行训练的LRM系统。Real3D引入了一种新颖的自我训练框架,可以从现有的合成数据和多样化的单视角真实图像中获益。我们提出了两种无监督损失,使我们能够在像素级和语义级监督LRMs,即使是对于没有地面真实3D或新视角的训练示例。为了进一步提高性能并扩展图像数据,我们开发了一种自动数据筛选方法,从野外图像中收集高质量示例。我们的实验表明,Real3D在包括真实和合成数据以及领域内外形状在内的四种不同评估设置中始终优于先前的工作。代码和模型可在此处找到:https://hwjiang1510.github.io/Real3D/
English
The default strategy for training single-view Large Reconstruction Models
(LRMs) follows the fully supervised route using large-scale datasets of
synthetic 3D assets or multi-view captures. Although these resources simplify
the training procedure, they are hard to scale up beyond the existing datasets
and they are not necessarily representative of the real distribution of object
shapes. To address these limitations, in this paper, we introduce Real3D, the
first LRM system that can be trained using single-view real-world images.
Real3D introduces a novel self-training framework that can benefit from both
the existing synthetic data and diverse single-view real images. We propose two
unsupervised losses that allow us to supervise LRMs at the pixel- and
semantic-level, even for training examples without ground-truth 3D or novel
views. To further improve performance and scale up the image data, we develop
an automatic data curation approach to collect high-quality examples from
in-the-wild images. Our experiments show that Real3D consistently outperforms
prior work in four diverse evaluation settings that include real and synthetic
data, as well as both in-domain and out-of-domain shapes. Code and model can be
found here: https://hwjiang1510.github.io/Real3D/Summary
AI-Generated Summary