MeshLLM:賦能大型語言模型逐步理解與生成三維網格
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
August 2, 2025
作者: Shuangkang Fang, I-Chao Shen, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Shuchang Zhou, Wenrui Ding, Takeo Igarashi, Ming-Hsuan Yang
cs.AI
摘要
我們提出MeshLLM,這是一個利用大型語言模型(LLMs)來理解並生成文本序列化3D網格的新穎框架。該方法針對現有技術中的關鍵限制進行了改進,包括在適應LLMs的令牌長度時數據集規模的局限性,以及在網格序列化過程中3D結構信息的丟失。我們引入了一種基元-網格分解策略,將3D網格劃分為具有結構意義的子單元。這一策略促成了包含超過150萬個樣本的大規模數據集的創建,其規模幾乎是之前方法的50倍,更符合LLM的規模化定律原則。此外,我們提出了從頂點推斷面連接性及局部網格組裝訓練策略,顯著增強了LLMs捕捉網格拓撲與空間結構的能力。實驗結果表明,MeshLLM在網格生成質量與形狀理解方面均超越了當前最先進的LLaMA-Mesh,展現了其在處理文本序列化3D網格方面的巨大潛力。
English
We present MeshLLM, a novel framework that leverages large language models
(LLMs) to understand and generate text-serialized 3D meshes. Our approach
addresses key limitations in existing methods, including the limited dataset
scale when catering to LLMs' token length and the loss of 3D structural
information during mesh serialization. We introduce a Primitive-Mesh
decomposition strategy, which divides 3D meshes into structurally meaningful
subunits. This enables the creation of a large-scale dataset with 1500k+
samples, almost 50 times larger than previous methods, which aligns better with
the LLM scaling law principles. Furthermore, we propose inferring face
connectivity from vertices and local mesh assembly training strategies,
significantly enhancing the LLMs' ability to capture mesh topology and spatial
structures. Experiments show that MeshLLM outperforms the state-of-the-art
LLaMA-Mesh in both mesh generation quality and shape understanding,
highlighting its great potential in processing text-serialized 3D meshes.