Make-A-Shape:一個千萬級別的3D形狀模型
Make-A-Shape: a Ten-Million-scale 3D Shape Model
January 20, 2024
作者: Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu
cs.AI
摘要
在訓練大型生成模型以產生自然語言和圖像方面已取得顯著進展。然而,3D生成模型的進展受到其在訓練過程中龐大的資源需求、以及效率低、非緊湊和表現較差的表示形式的阻礙。本文介紹了Make-A-Shape,這是一種新的3D生成模型,旨在以高效的方式在廣泛範圍內進行訓練,能夠利用1000萬個公開可用的形狀。從技術角度來看,我們首先創新地引入了小波樹表示法,通過制定子帶係數過濾方案來緊湊編碼形狀,以有效利用係數之間的關係。然後,通過設計子帶係數打包方案將表示形式佈局在低分辨率網格中,使表示形式可以被擴散模型生成。此外,我們推導出子帶自適應訓練策略,以有效地訓練我們的模型學習生成粗糙和細節小波係數。最後,我們擴展了我們的框架,通過額外的輸入條件來控制,使其能夠從各種模態生成形狀,例如單視圖/多視圖圖像、點雲和低分辨率體素。在我們豐富的一系列實驗中,我們展示了各種應用,例如無條件生成、形狀完成以及在各種模態上的有條件生成。我們的方法不僅在提供高質量結果方面超越了現有技術水平,而且能夠在幾秒內高效生成形狀,通常在大多數情況下僅需2秒即可實現。
English
Significant progress has been made in training large generative models for
natural language and images. Yet, the advancement of 3D generative models is
hindered by their substantial resource demands for training, along with
inefficient, non-compact, and less expressive representations. This paper
introduces Make-A-Shape, a new 3D generative model designed for efficient
training on a vast scale, capable of utilizing 10 millions publicly-available
shapes. Technical-wise, we first innovate a wavelet-tree representation to
compactly encode shapes by formulating the subband coefficient filtering scheme
to efficiently exploit coefficient relations. We then make the representation
generatable by a diffusion model by devising the subband coefficients packing
scheme to layout the representation in a low-resolution grid. Further, we
derive the subband adaptive training strategy to train our model to effectively
learn to generate coarse and detail wavelet coefficients. Last, we extend our
framework to be controlled by additional input conditions to enable it to
generate shapes from assorted modalities, e.g., single/multi-view images, point
clouds, and low-resolution voxels. In our extensive set of experiments, we
demonstrate various applications, such as unconditional generation, shape
completion, and conditional generation on a wide range of modalities. Our
approach not only surpasses the state of the art in delivering high-quality
results but also efficiently generates shapes within a few seconds, often
achieving this in just 2 seconds for most conditions.