3D 생성 모델을 위한 Mosaic-SDF

초록

현재 3D 형태를 위한 확산(diffusion) 또는 흐름(flow) 기반 생성 모델은 크게 두 가지로 나뉩니다: 사전 훈련된 2D 이미지 확산 모델을 증류(distilling)하는 방법과 3D 형태에 직접 훈련하는 방법입니다. 3D 형태에 대해 확산 또는 흐름 모델을 훈련할 때 중요한 설계 선택은 형태 표현(shape representation)입니다. 효과적인 형태 표현은 세 가지 설계 원칙을 준수해야 합니다: 대규모 3D 데이터셋을 해당 표현 형태로 효율적으로 변환할 수 있어야 하며, 근사 능력과 매개변수 수 간의 적절한 균형을 제공해야 하고, 기존의 강력한 신경망 아키텍처와 호환되는 간단한 텐서 형태를 가져야 합니다. 볼륨 그리드(volumetric grids)와 포인트 클라우드(point clouds)와 같은 표준 3D 형태 표현은 이러한 원칙들을 동시에 충족하지 못하지만, 본 논문에서는 이를 모두 충족하는 새로운 표현을 제안합니다. 우리는 Mosaic-SDF(M-SDF)를 소개합니다: 이는 주어진 형태의 부호 거리 함수(Signed Distance Function, SDF)를 형태의 경계 근처에 분포된 일련의 로컬 그리드로 근사하는 간단한 3D 형태 표현입니다. M-SDF 표현은 각 형태에 대해 빠르게 계산할 수 있어 병렬화가 용이하며, 형태의 경계 주변 공간만을 다루기 때문에 매개변수 효율적이고, Transformer 기반 아키텍처와 호환되는 간단한 행렬 형태를 가집니다. 우리는 M-SDF 표현의 효용성을 3D Warehouse 데이터셋을 사용한 클래스 조건부 생성(class-conditioned generation)과 약 60만 개의 캡션-형태 쌍으로 구성된 데이터셋을 사용한 텍스트-3D 생성(text-to-3D generation)을 포함한 3D 생성 흐름 모델을 훈련함으로써 입증합니다.

English

Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D datasets to the representation form; it should provide a good tradeoff of approximation power versus number of parameters; and it should have a simple tensorial form that is compatible with existing powerful neural architectures. While standard 3D shape representations such as volumetric grids and point clouds do not adhere to all these principles simultaneously, we advocate in this paper a new representation that does. We introduce Mosaic-SDF (M-SDF): a simple 3D shape representation that approximates the Signed Distance Function (SDF) of a given shape by using a set of local grids spread near the shape's boundary. The M-SDF representation is fast to compute for each shape individually making it readily parallelizable; it is parameter efficient as it only covers the space around the shape's boundary; and it has a simple matrix form, compatible with Transformer-based architectures. We demonstrate the efficacy of the M-SDF representation by using it to train a 3D generative flow model including class-conditioned generation with the 3D Warehouse dataset, and text-to-3D generation using a dataset of about 600k caption-shape pairs.

3D 생성 모델을 위한 Mosaic-SDF

Mosaic-SDF for 3D Generative Models

초록

Support