ChatPaper.aiChatPaper

Make-A-Shape:一个千万级别的3D形状模型

Make-A-Shape: a Ten-Million-scale 3D Shape Model

January 20, 2024
作者: Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu
cs.AI

摘要

在训练大型生成模型以生成自然语言和图像方面取得了显著进展。然而,3D生成模型的发展受到其在训练过程中对资源的巨大需求以及表示方式的低效、非紧凑和表达能力较弱的限制。本文介绍了Make-A-Shape,这是一种新的3D生成模型,旨在以高效的方式进行大规模训练,能够利用1000万个公开可用的形状。从技术角度来看,我们首先创新了一种小波树表示法,通过制定子带系数滤波方案来紧凑地编码形状,以有效利用系数之间的关系。然后,通过设计子带系数打包方案将表示布局在低分辨率网格中,使表示可由扩散模型生成。此外,我们提出了子带自适应训练策略,以有效学习生成粗糙和详细的小波系数。最后,我们将我们的框架扩展为可由额外输入条件控制,使其能够从各种模态生成形状,例如单/多视图图像、点云和低分辨率体素。在我们广泛的一系列实验中,我们展示了各种应用,例如无条件生成、形状完成以及在各种模态上的有条件生成。我们的方法不仅在提供高质量结果方面超越了现有技术水平,而且能够在几秒内高效生成形状,通常在大多数条件下仅需2秒即可实现。
English
Significant progress has been made in training large generative models for natural language and images. Yet, the advancement of 3D generative models is hindered by their substantial resource demands for training, along with inefficient, non-compact, and less expressive representations. This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale, capable of utilizing 10 millions publicly-available shapes. Technical-wise, we first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme to efficiently exploit coefficient relations. We then make the representation generatable by a diffusion model by devising the subband coefficients packing scheme to layout the representation in a low-resolution grid. Further, we derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients. Last, we extend our framework to be controlled by additional input conditions to enable it to generate shapes from assorted modalities, e.g., single/multi-view images, point clouds, and low-resolution voxels. In our extensive set of experiments, we demonstrate various applications, such as unconditional generation, shape completion, and conditional generation on a wide range of modalities. Our approach not only surpasses the state of the art in delivering high-quality results but also efficiently generates shapes within a few seconds, often achieving this in just 2 seconds for most conditions.
PDF171December 15, 2024