One-2-3-45：在45秒内将任意单张图像转换为3D网格，无需进行逐形状优化。

摘要

单图像3D重建是一项重要且具有挑战性的任务，需要对我们自然世界有广泛的了解。许多现有方法通过在2D扩散模型的指导下优化神经辐射场来解决这个问题，但存在优化时间长、3D不一致结果和几何质量差的问题。在这项工作中，我们提出了一种新颖的方法，它以任何物体的单个图像作为输入，并在单个前向传递中生成完整的360度3D纹理网格。给定单个图像，我们首先使用一个视角条件的2D扩散模型Zero123为输入视角生成多视角图像，然后旨在将它们提升到3D空间。由于传统重建方法在不一致的多视角预测方面存在困难，我们基于基于SDF的通用化神经表面重建方法构建我们的3D重建模块，并提出了几种关键的训练策略，以实现360度网格的重建。在没有昂贵优化的情况下，我们的方法比现有方法更快地重建3D形状。此外，我们的方法更有利于几何质量，生成更一致的3D结果，并更贴近输入图像。我们在合成数据和野外图像上评估了我们的方法，并展示了其在网格质量和运行时间方面的优越性。此外，我们的方法可以通过与现成的文本到图像扩散模型集成，无缝支持文本到3D任务。

English

Single image 3D reconstruction is an important but challenging task that requires extensive knowledge of our natural world. Many existing methods solve this problem by optimizing a neural radiance field under the guidance of 2D diffusion models but suffer from lengthy optimization time, 3D inconsistency results, and poor geometry. In this work, we propose a novel method that takes a single image of any object as input and generates a full 360-degree 3D textured mesh in a single feed-forward pass. Given a single image, we first use a view-conditioned 2D diffusion model, Zero123, to generate multi-view images for the input view, and then aim to lift them up to 3D space. Since traditional reconstruction methods struggle with inconsistent multi-view predictions, we build our 3D reconstruction module upon an SDF-based generalizable neural surface reconstruction method and propose several critical training strategies to enable the reconstruction of 360-degree meshes. Without costly optimizations, our method reconstructs 3D shapes in significantly less time than existing methods. Moreover, our method favors better geometry, generates more 3D consistent results, and adheres more closely to the input image. We evaluate our approach on both synthetic data and in-the-wild images and demonstrate its superiority in terms of both mesh quality and runtime. In addition, our approach can seamlessly support the text-to-3D task by integrating with off-the-shelf text-to-image diffusion models.

One-2-3-45：在45秒内将任意单张图像转换为3D网格，无需进行逐形状优化。

One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization

摘要

Support