Mosaic-SDF für 3D-Generative Modelle

papers.abstract

Aktuelle Diffusions- oder Flow-basierte generative Modelle für 3D-Formen lassen sich in zwei Kategorien einteilen: die Destillation vortrainierter 2D-Bilddiffusionsmodelle und das direkte Training auf 3D-Formen. Beim Training eines Diffusions- oder Flow-Modells auf 3D-Formen ist die Wahl der Formrepräsentation eine entscheidende Designentscheidung. Eine effektive Formrepräsentation muss drei Designprinzipien erfüllen: Sie sollte eine effiziente Konvertierung großer 3D-Datensätze in die Repräsentationsform ermöglichen; sie sollte eine gute Balance zwischen Approximationskraft und Parameteranzahl bieten; und sie sollte eine einfache tensorielle Form haben, die mit bestehenden leistungsstarken neuronalen Architekturen kompatibel ist. Während Standard-3D-Formrepräsentationen wie volumetrische Gitter und Punktwolken nicht alle diese Prinzipien gleichzeitig erfüllen, befürworten wir in diesem Artikel eine neue Repräsentation, die dies tut. Wir stellen Mosaic-SDF (M-SDF) vor: eine einfache 3D-Formrepräsentation, die die Signed Distance Function (SDF) einer gegebenen Form durch eine Reihe von lokalen Gittern in der Nähe der Formgrenze approximiert. Die M-SDF-Repräsentation ist schnell für jede einzelne Form zu berechnen, was sie leicht parallelisierbar macht; sie ist parameter-effizient, da sie nur den Raum um die Formgrenze abdeckt; und sie hat eine einfache Matrixform, die mit Transformer-basierten Architekturen kompatibel ist. Wir demonstrieren die Wirksamkeit der M-SDF-Repräsentation, indem wir sie verwenden, um ein 3D-generatives Flow-Modell zu trainieren, einschließlich klassenbedingter Generierung mit dem 3D Warehouse-Datensatz und Text-zu-3D-Generierung mit einem Datensatz von etwa 600.000 Beschriftungs-Form-Paaren.

English

Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D datasets to the representation form; it should provide a good tradeoff of approximation power versus number of parameters; and it should have a simple tensorial form that is compatible with existing powerful neural architectures. While standard 3D shape representations such as volumetric grids and point clouds do not adhere to all these principles simultaneously, we advocate in this paper a new representation that does. We introduce Mosaic-SDF (M-SDF): a simple 3D shape representation that approximates the Signed Distance Function (SDF) of a given shape by using a set of local grids spread near the shape's boundary. The M-SDF representation is fast to compute for each shape individually making it readily parallelizable; it is parameter efficient as it only covers the space around the shape's boundary; and it has a simple matrix form, compatible with Transformer-based architectures. We demonstrate the efficacy of the M-SDF representation by using it to train a 3D generative flow model including class-conditioned generation with the 3D Warehouse dataset, and text-to-3D generation using a dataset of about 600k caption-shape pairs.

Mosaic-SDF für 3D-Generative Modelle

Mosaic-SDF for 3D Generative Models

papers.abstract

Support