LRM:单图像到3D的大型重建模型
LRM: Large Reconstruction Model for Single Image to 3D
November 8, 2023
作者: Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan
cs.AI
摘要
我们提出了第一个大型重建模型(LRM),可以在仅5秒内从单个输入图像预测物体的3D模型。与许多先前的方法不同,这些方法通常在小规模数据集(如ShapeNet)上以特定类别的方式进行训练,LRM采用了一个高度可扩展的基于Transformer的架构,具有5亿个可学习参数,可以直接从输入图像预测神经辐射场(NeRF)。我们以端到端的方式在包含大约100万个对象的大规模多视角数据上训练我们的模型,包括来自Objaverse的合成渲染和来自MVImgNet的真实捕获。这种高容量模型和大规模训练数据的结合使我们的模型具有很高的泛化能力,并能够从各种测试输入(包括真实世界野外捕获和生成模型的图像)中产生高质量的3D重建。视频演示和可交互的3D网格可以在以下网站找到:https://yiconghong.me/LRM/。
English
We propose the first Large Reconstruction Model (LRM) that predicts the 3D
model of an object from a single input image within just 5 seconds. In contrast
to many previous methods that are trained on small-scale datasets such as
ShapeNet in a category-specific fashion, LRM adopts a highly scalable
transformer-based architecture with 500 million learnable parameters to
directly predict a neural radiance field (NeRF) from the input image. We train
our model in an end-to-end manner on massive multi-view data containing around
1 million objects, including both synthetic renderings from Objaverse and real
captures from MVImgNet. This combination of a high-capacity model and
large-scale training data empowers our model to be highly generalizable and
produce high-quality 3D reconstructions from various testing inputs including
real-world in-the-wild captures and images from generative models. Video demos
and interactable 3D meshes can be found on this website:
https://yiconghong.me/LRM/.