ChatPaper.aiChatPaper

用於3D生成模型的Mosaic-SDF

Mosaic-SDF for 3D Generative Models

December 14, 2023
作者: Lior Yariv, Omri Puny, Natalia Neverova, Oran Gafni, Yaron Lipman
cs.AI

摘要

目前用於3D形狀的擴散或基於流的生成模型可分為兩種:提煉預先訓練的2D圖像擴散模型,以及直接在3D形狀上進行訓練。在對3D形狀進行擴散或流模型訓練時,一個至關重要的設計選擇是形狀表示法。一種有效的形狀表示法需要遵循三個設計原則:它應允許將大型3D數據集有效轉換為表示形式;它應提供良好的近似能力與參數數量之間的折衷;並且它應具有與現有強大神經結構相容的簡單張量形式。儘管標準的3D形狀表示法,如體積網格和點雲,無法同時遵循所有這些原則,但我們在本文中提倡一種新的表示法,即Mosaic-SDF(M-SDF)。M-SDF是一種簡單的3D形狀表示法,通過使用分佈在形狀邊界附近的一組局部網格來近似給定形狀的符號距離函數(SDF)。M-SDF表示法對於每個形狀的計算速度快,使其易於並行化;它在參數效率上效果顯著,因為它僅涵蓋形狀周圍的空間;並且它具有簡單的矩陣形式,與基於Transformer的結構相容。我們通過使用M-SDF表示法來訓練一個包括類別條件生成的3D生成流模型來展示其有效性,其中使用了3D Warehouse數據集,以及使用約600k標題-形狀對的數據集進行文本到3D生成。
English
Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D datasets to the representation form; it should provide a good tradeoff of approximation power versus number of parameters; and it should have a simple tensorial form that is compatible with existing powerful neural architectures. While standard 3D shape representations such as volumetric grids and point clouds do not adhere to all these principles simultaneously, we advocate in this paper a new representation that does. We introduce Mosaic-SDF (M-SDF): a simple 3D shape representation that approximates the Signed Distance Function (SDF) of a given shape by using a set of local grids spread near the shape's boundary. The M-SDF representation is fast to compute for each shape individually making it readily parallelizable; it is parameter efficient as it only covers the space around the shape's boundary; and it has a simple matrix form, compatible with Transformer-based architectures. We demonstrate the efficacy of the M-SDF representation by using it to train a 3D generative flow model including class-conditioned generation with the 3D Warehouse dataset, and text-to-3D generation using a dataset of about 600k caption-shape pairs.
PDF194December 15, 2024