ChatPaper.aiChatPaper

用于3D生成模型的马赛克-SDF

Mosaic-SDF for 3D Generative Models

December 14, 2023
作者: Lior Yariv, Omri Puny, Natalia Neverova, Oran Gafni, Yaron Lipman
cs.AI

摘要

目前用于3D形状的扩散或基于流的生成模型可分为两种:提炼预训练的2D图像扩散模型,以及直接在3D形状上进行训练。在对3D形状进行扩散或流模型训练时,一个关键的设计选择是形状表示。一种有效的形状表示需要遵循三个设计原则:它应该允许将大型3D数据集高效地转换为表示形式;它应该在逼近能力与参数数量之间提供良好的权衡;并且它应该具有与现有强大神经架构兼容的简单张量形式。尽管标准的3D形状表示,如体积网格和点云,无法同时遵循所有这些原则,但我们在本文中提倡一种新的能够做到的表示形式。我们引入Mosaic-SDF(M-SDF):一种简单的3D形状表示,通过使用分布在形状边界附近的一组局部网格来近似给定形状的有符号距离函数(SDF)。M-SDF表示对每个形状的计算速度快,易于并行化;它在参数效率上表现出色,因为它仅覆盖形状边界周围的空间;并且它具有与基于Transformer的架构兼容的简单矩阵形式。我们通过使用M-SDF表示来训练一个3D生成流模型来展示其有效性,包括使用3D Warehouse数据集进行类别条件生成,以及使用约60万个标题-形状对数据集进行文本到3D生成。
English
Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D datasets to the representation form; it should provide a good tradeoff of approximation power versus number of parameters; and it should have a simple tensorial form that is compatible with existing powerful neural architectures. While standard 3D shape representations such as volumetric grids and point clouds do not adhere to all these principles simultaneously, we advocate in this paper a new representation that does. We introduce Mosaic-SDF (M-SDF): a simple 3D shape representation that approximates the Signed Distance Function (SDF) of a given shape by using a set of local grids spread near the shape's boundary. The M-SDF representation is fast to compute for each shape individually making it readily parallelizable; it is parameter efficient as it only covers the space around the shape's boundary; and it has a simple matrix form, compatible with Transformer-based architectures. We demonstrate the efficacy of the M-SDF representation by using it to train a 3D generative flow model including class-conditioned generation with the 3D Warehouse dataset, and text-to-3D generation using a dataset of about 600k caption-shape pairs.
PDF194December 15, 2024