LRM：單張圖像到3D的大型重建模型

摘要

我們提出了第一個大型重建模型（LRM），可以在僅 5 秒內從單張輸入圖像預測物體的 3D 模型。與許多先前的方法不同，這些方法通常是在小規模數據集（如 ShapeNet）上以特定類別的方式進行訓練，LRM 採用了一種高度可擴展的基於 Transformer 的架構，具有 5 億個可學習參數，可以直接從輸入圖像預測神經輻射場（NeRF）。我們以端到端的方式在包含約 100 萬個物體的大量多視圖數據上訓練我們的模型，其中包括來自 Objaverse 的合成渲染和來自 MVImgNet 的真實捕獲。這種高容量模型和大規模訓練數據的結合使我們的模型具有高度通用性，可以從各種測試輸入（包括真實世界中的野外捕獲和生成模型的圖像）中產生高質量的 3D 重建。視頻演示和可交互的 3D 網格可以在此網站找到：https://yiconghong.me/LRM/.

English

We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. In contrast to many previous methods that are trained on small-scale datasets such as ShapeNet in a category-specific fashion, LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image. We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects, including both synthetic renderings from Objaverse and real captures from MVImgNet. This combination of a high-capacity model and large-scale training data empowers our model to be highly generalizable and produce high-quality 3D reconstructions from various testing inputs including real-world in-the-wild captures and images from generative models. Video demos and interactable 3D meshes can be found on this website: https://yiconghong.me/LRM/.

LRM：單張圖像到3D的大型重建模型

LRM: Large Reconstruction Model for Single Image to 3D

摘要

Support