Make-A-Shape：1000万規模の3D形状モデル

要旨

自然言語や画像のための大規模生成モデルの学習において、重要な進展が見られています。しかし、3D生成モデルの進歩は、その訓練に必要な膨大なリソースと、非効率的で非コンパクト、かつ表現力の低い表現によって妨げられています。本論文では、大規模な訓練を効率的に行い、1000万の公開形状を利用可能な新しい3D生成モデル「Make-A-Shape」を紹介します。技術的には、まず、サブバンド係数フィルタリングスキームを定式化して係数間の関係を効率的に活用し、形状をコンパクトにエンコードするウェーブレットツリー表現を考案します。次に、低解像度グリッドに表現をレイアウトするサブバンド係数パッキングスキームを設計し、拡散モデルによって生成可能な表現とします。さらに、粗いウェーブレット係数と詳細なウェーブレット係数を効果的に学習するためのサブバンド適応型訓練戦略を導出します。最後に、追加の入力条件によって制御可能なフレームワークに拡張し、単一/複数視点画像、点群、低解像度ボクセルなど、さまざまなモダリティから形状を生成できるようにします。広範な実験を通じて、無条件生成、形状補完、多様なモダリティでの条件付き生成など、さまざまな応用例を実証します。我々のアプローチは、高品質な結果を提供する点で最先端を凌駕するだけでなく、ほとんどの条件でわずか2秒以内に形状を効率的に生成します。

English

Significant progress has been made in training large generative models for natural language and images. Yet, the advancement of 3D generative models is hindered by their substantial resource demands for training, along with inefficient, non-compact, and less expressive representations. This paper introduces Make-A-Shape, a new 3D generative model designed for efficient training on a vast scale, capable of utilizing 10 millions publicly-available shapes. Technical-wise, we first innovate a wavelet-tree representation to compactly encode shapes by formulating the subband coefficient filtering scheme to efficiently exploit coefficient relations. We then make the representation generatable by a diffusion model by devising the subband coefficients packing scheme to layout the representation in a low-resolution grid. Further, we derive the subband adaptive training strategy to train our model to effectively learn to generate coarse and detail wavelet coefficients. Last, we extend our framework to be controlled by additional input conditions to enable it to generate shapes from assorted modalities, e.g., single/multi-view images, point clouds, and low-resolution voxels. In our extensive set of experiments, we demonstrate various applications, such as unconditional generation, shape completion, and conditional generation on a wide range of modalities. Our approach not only surpasses the state of the art in delivering high-quality results but also efficiently generates shapes within a few seconds, often achieving this in just 2 seconds for most conditions.

Make-A-Shape：1000万規模の3D形状モデル

Make-A-Shape: a Ten-Million-scale 3D Shape Model

要旨

Support