MeshLLM:赋能大语言模型逐步理解与生成三维网格
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
August 2, 2025
作者: Shuangkang Fang, I-Chao Shen, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Shuchang Zhou, Wenrui Ding, Takeo Igarashi, Ming-Hsuan Yang
cs.AI
摘要
我们提出了MeshLLM,一个创新框架,它利用大型语言模型(LLMs)来理解和生成文本序列化的3D网格。该方法针对现有技术的关键局限,包括为适应LLMs的令牌长度而受限的数据集规模,以及在网格序列化过程中丢失的3D结构信息。我们引入了一种原始网格分解策略,将3D网格划分为具有结构意义的子单元。这一策略促成了包含150万+样本的大规模数据集的构建,规模几乎是先前方法的50倍,更符合LLM扩展定律的原则。此外,我们提出了从顶点推断面连接性及局部网格组装训练策略,显著增强了LLMs捕捉网格拓扑与空间结构的能力。实验表明,MeshLLM在网格生成质量与形状理解方面均超越了当前最先进的LLaMA-Mesh,彰显了其在处理文本序列化3D网格方面的巨大潜力。
English
We present MeshLLM, a novel framework that leverages large language models
(LLMs) to understand and generate text-serialized 3D meshes. Our approach
addresses key limitations in existing methods, including the limited dataset
scale when catering to LLMs' token length and the loss of 3D structural
information during mesh serialization. We introduce a Primitive-Mesh
decomposition strategy, which divides 3D meshes into structurally meaningful
subunits. This enables the creation of a large-scale dataset with 1500k+
samples, almost 50 times larger than previous methods, which aligns better with
the LLM scaling law principles. Furthermore, we propose inferring face
connectivity from vertices and local mesh assembly training strategies,
significantly enhancing the LLMs' ability to capture mesh topology and spatial
structures. Experiments show that MeshLLM outperforms the state-of-the-art
LLaMA-Mesh in both mesh generation quality and shape understanding,
highlighting its great potential in processing text-serialized 3D meshes.