MeshWeaver: スパースボクセル誘導による表面織りを用いた自己回帰メッシュ生成

要旨

自己回帰メッシュ生成は、メッシュをトークン列に変換し、言語モデリングの手法でモデルを学習することで注目を集めている。しかし、既存の手法には2つの根本的な限界がある：(i) トークン化効率が低く、結果としてトークン列が長くなり、高ポリゴンメッシュへのスケーリングが妨げられる、(ii) 幾何学的な知識に基づくガイダンスが欠如しており、生成が局所的な表面の手がかりではなく大域的な形状埋め込みにのみ依存している。本稿では、自己回帰フレームワークであるMeshWeaverを提案する。これは、メッシュ生成を表面織り込みプロセスとして捉え、独立した座標ではなく次の頂点を直接予測する。その核心は、マルチレベルのスパースボクセルエンコーダであり、3つの補完的な方法で幾何学的コンテキストを生成プロセスに注入する：頂点表現としてのボクセル特徴の提供、ボクセル特徴へのクロスアテンションによるトークン予測のガイダンス、そして入力表面周辺の生成を制約する構造的足場としての役割である。本階層的設計により、単一の復号ステップでの粗密の頂点予測が可能となり、生成モデルと3次元幾何学との緊密な結合を実現する。広範な実験により、MeshWeaverは18%という最先端の圧縮比を達成し、最大16K面のメッシュを生成可能であり、従来手法に比べて幾何学的忠実度を大幅に向上させることを実証する。

English

Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in a language-modeling fashion. However, existing approaches suffer from two fundamental limitations: (i) low tokenization efficiency, which yields long token sequences and prevents scaling to high-poly meshes, and (ii) absence of geometry-aware guidance, as generation is conditioned only on global shape embeddings rather than local surface cues. We introduce MeshWeaver, an autoregressive framework that treats mesh generation as a surface weaving process by directly predicting the next vertex instead of independent coordinates. At its core is a multi-level sparse-voxel encoder that injects geometric context into the generative process in three complementary ways: providing voxel features as vertex representations, guiding token prediction via cross-attention to voxel features, and serving as a structural scaffold that constrains generation around the input surface. Our hierarchical design enables coarse-to-fine vertex prediction in a single decoding step, while tightly coupling the generative model with 3D geometry. Extensive experiments demonstrate that MeshWeaver achieves a state-of-the-art compression ratio of 18%, can generate meshes with up to 16K faces, and significantly improves geometric fidelity over prior approaches.