CRM:使用卷积重建的单图像到3D纹理网格模型
CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
March 8, 2024
作者: Zhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu
cs.AI
摘要
前馈3D生成模型,如大型重建模型(LRM),已经展示出出色的生成速度。然而,基于Transformer的方法并未利用其架构中三平面组件的几何先验,这常常导致在3D数据规模有限且训练缓慢的情况下质量不佳。在这项工作中,我们提出了卷积重建模型(CRM),这是一个高保真的前馈单图像到3D生成模型。鉴于稀疏3D数据带来的限制,我们强调了将几何先验整合到网络设计中的必要性。CRM基于一个关键观察,即三平面的可视化展示出六个正交图像的空间对应关系。首先,它从单个输入图像生成六个正交视图图像,然后将这些图像馈送到卷积U-Net中,利用其强大的像素级对齐能力和显著的带宽,创建高分辨率的三平面。CRM进一步采用Flexicubes作为几何表示,有助于在纹理网格上进行直接端到端的优化。总体而言,我们的模型仅需10秒就能从图像中生成高保真纹理网格,无需任何测试时优化。
English
Feed-forward 3D generative models like the Large Reconstruction Model (LRM)
have demonstrated exceptional generation speed. However, the transformer-based
methods do not leverage the geometric priors of the triplane component in their
architecture, often leading to sub-optimal quality given the limited size of 3D
data and slow training. In this work, we present the Convolutional
Reconstruction Model (CRM), a high-fidelity feed-forward single image-to-3D
generative model. Recognizing the limitations posed by sparse 3D data, we
highlight the necessity of integrating geometric priors into network design.
CRM builds on the key observation that the visualization of triplane exhibits
spatial correspondence of six orthographic images. First, it generates six
orthographic view images from a single input image, then feeds these images
into a convolutional U-Net, leveraging its strong pixel-level alignment
capabilities and significant bandwidth to create a high-resolution triplane.
CRM further employs Flexicubes as geometric representation, facilitating direct
end-to-end optimization on textured meshes. Overall, our model delivers a
high-fidelity textured mesh from an image in just 10 seconds, without any
test-time optimization.