MeshLLM: 대형 언어 모델이 점진적으로 3D 메쉬를 이해하고 생성할 수 있도록 지원

초록

우리는 대규모 언어 모델(LLM)을 활용하여 텍스트로 직렬화된 3D 메시를 이해하고 생성하는 새로운 프레임워크인 MeshLLM을 소개합니다. 우리의 접근 방식은 LLM의 토큰 길이에 맞춘 데이터셋 규모의 한계와 메시 직렬화 과정에서 발생하는 3D 구조 정보의 손실과 같은 기존 방법의 주요 한계를 해결합니다. 우리는 3D 메시를 구조적으로 의미 있는 하위 단위로 분할하는 Primitive-Mesh 분해 전략을 도입했습니다. 이를 통해 150만 개 이상의 샘플로 구성된 대규모 데이터셋을 생성할 수 있었으며, 이는 이전 방법보다 거의 50배 더 큰 규모로, LLM 스케일링 법칙 원칙에 더 잘 부합합니다. 또한, 정점에서 면 연결성을 추론하고 로컬 메시 조립 훈련 전략을 제안함으로써, LLM이 메시 토폴로지와 공간 구조를 포착하는 능력을 크게 향상시켰습니다. 실험 결과, MeshLLM은 메시 생성 품질과 형태 이해 모두에서 최신 기술인 LLaMA-Mesh를 능가하며, 텍스트 직렬화된 3D 메시 처리에서의 큰 잠재력을 보여줍니다.

English

We present MeshLLM, a novel framework that leverages large language models (LLMs) to understand and generate text-serialized 3D meshes. Our approach addresses key limitations in existing methods, including the limited dataset scale when catering to LLMs' token length and the loss of 3D structural information during mesh serialization. We introduce a Primitive-Mesh decomposition strategy, which divides 3D meshes into structurally meaningful subunits. This enables the creation of a large-scale dataset with 1500k+ samples, almost 50 times larger than previous methods, which aligns better with the LLM scaling law principles. Furthermore, we propose inferring face connectivity from vertices and local mesh assembly training strategies, significantly enhancing the LLMs' ability to capture mesh topology and spatial structures. Experiments show that MeshLLM outperforms the state-of-the-art LLaMA-Mesh in both mesh generation quality and shape understanding, highlighting its great potential in processing text-serialized 3D meshes.

MeshLLM: 대형 언어 모델이 점진적으로 3D 메쉬를 이해하고 생성할 수 있도록 지원

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

초록

Support