MeshCraft: 플로우 기반 DiT를 활용한 효율적이고 제어 가능한 메쉬 생성 탐구

초록

3D 콘텐츠 제작 분야에서 AI 모델을 통해 최적의 메시 토폴로지를 달성하는 것은 오랫동안 3D 아티스트들의 목표였습니다. 이전의 방법들, 예를 들어 MeshGPT는 메시 자동회귀 기법을 통해 바로 사용 가능한 3D 객체를 생성하는 방법을 탐구했습니다. 이러한 방법들은 시각적으로 인상적인 결과를 생성하지만, 자동회귀 과정에서 토큰 단위의 예측에 의존하기 때문에 몇 가지 중요한 한계점을 가지고 있습니다. 이로 인해 생성 속도가 매우 느리고, 메시 면의 수를 제어할 수 없다는 문제가 발생합니다. 본 논문에서는 연속적인 공간 확산을 활용하여 이산적인 삼각형 면을 생성하는 효율적이고 제어 가능한 메시 생성 프레임워크인 MeshCraft를 소개합니다. 구체적으로, MeshCraft는 두 가지 핵심 구성 요소로 이루어져 있습니다: 1) 원시 메시를 연속적인 면 수준의 토큰으로 인코딩하고 이를 다시 원래의 메시로 디코딩하는 트랜스포머 기반 VAE, 그리고 2) 면의 수를 조건으로 하는 플로우 기반 확산 트랜스포머로, 이는 미리 정의된 면의 수를 가진 고품질 3D 메시를 생성할 수 있게 합니다. MeshCraft는 확산 모델을 활용하여 전체 메시 토폴로지를 동시에 생성함으로써, 자동회귀 방법에 비해 훨씬 빠른 속도로 고해상도 메시 생성을 달성합니다. 구체적으로, MeshCraft는 800개의 면을 가진 메시를 단 3.2초 만에 생성할 수 있으며(기존 기준선보다 35배 빠름), ShapeNet 데이터셋에서의 정성적 및 정량적 평가에서 최신 기술을 능가하는 성능을 보여줍니다. 또한, Objaverse 데이터셋에서도 우수한 성능을 입증하며, 기존의 조건부 지침 전략과 원활하게 통합되어 메시 생성에 소요되는 시간 소모적인 수작업을 줄이는 데 있어 그 잠재력을 보여줍니다.

English

In the domain of 3D content creation, achieving optimal mesh topology through AI models has long been a pursuit for 3D artists. Previous methods, such as MeshGPT, have explored the generation of ready-to-use 3D objects via mesh auto-regressive techniques. While these methods produce visually impressive results, their reliance on token-by-token predictions in the auto-regressive process leads to several significant limitations. These include extremely slow generation speeds and an uncontrollable number of mesh faces. In this paper, we introduce MeshCraft, a novel framework for efficient and controllable mesh generation, which leverages continuous spatial diffusion to generate discrete triangle faces. Specifically, MeshCraft consists of two core components: 1) a transformer-based VAE that encodes raw meshes into continuous face-level tokens and decodes them back to the original meshes, and 2) a flow-based diffusion transformer conditioned on the number of faces, enabling the generation of high-quality 3D meshes with a predefined number of faces. By utilizing the diffusion model for the simultaneous generation of the entire mesh topology, MeshCraft achieves high-fidelity mesh generation at significantly faster speeds compared to auto-regressive methods. Specifically, MeshCraft can generate an 800-face mesh in just 3.2 seconds (35times faster than existing baselines). Extensive experiments demonstrate that MeshCraft outperforms state-of-the-art techniques in both qualitative and quantitative evaluations on ShapeNet dataset and demonstrates superior performance on Objaverse dataset. Moreover, it integrates seamlessly with existing conditional guidance strategies, showcasing its potential to relieve artists from the time-consuming manual work involved in mesh creation.

MeshCraft: 플로우 기반 DiT를 활용한 효율적이고 제어 가능한 메쉬 생성 탐구

MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs

초록

Support