MeshLLM: 大規模言語モデルによる3Dメッシュの段階的理解と生成の実現

要旨

本論文では、大規模言語モデル（LLM）を活用してテキストシリアライズされた3Dメッシュを理解および生成する新規フレームワーク、MeshLLMを提案する。本手法は、LLMのトークン長に対応する際のデータセット規模の制約や、メッシュシリアライズ中の3D構造情報の損失といった既存手法の主要な課題に対処する。我々は、3Dメッシュを構造的に意味のあるサブユニットに分割するPrimitive-Mesh分解戦略を導入し、これにより150万件以上のサンプルを有する大規模データセットを構築した。このデータセットは従来手法の約50倍の規模であり、LLMのスケーリング法則の原則により適している。さらに、頂点から面の接続性を推論し、ローカルメッシュアセンブリのトレーニング戦略を提案することで、LLMがメッシュトポロジーと空間構造を捉える能力を大幅に向上させた。実験結果から、MeshLLMは最新のLLaMA-Meshをメッシュ生成品質と形状理解の両面で凌駕し、テキストシリアライズされた3Dメッシュを処理する上での大きな可能性を示している。

English

We present MeshLLM, a novel framework that leverages large language models (LLMs) to understand and generate text-serialized 3D meshes. Our approach addresses key limitations in existing methods, including the limited dataset scale when catering to LLMs' token length and the loss of 3D structural information during mesh serialization. We introduce a Primitive-Mesh decomposition strategy, which divides 3D meshes into structurally meaningful subunits. This enables the creation of a large-scale dataset with 1500k+ samples, almost 50 times larger than previous methods, which aligns better with the LLM scaling law principles. Furthermore, we propose inferring face connectivity from vertices and local mesh assembly training strategies, significantly enhancing the LLMs' ability to capture mesh topology and spatial structures. Experiments show that MeshLLM outperforms the state-of-the-art LLaMA-Mesh in both mesh generation quality and shape understanding, highlighting its great potential in processing text-serialized 3D meshes.

MeshLLM: 大規模言語モデルによる3Dメッシュの段階的理解と生成の実現

MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

要旨

Support