MeshWeaver：基於稀疏體素引導的表面編織用於自回歸網格生成

摘要

自回归网格生成通過將網格標記化為序列並以語言建模方式訓練模型而受到關注。然而，現有方法存在兩項根本限制：(i) 標記化效率低下，導致標記序列過長，無法擴展至高多邊形網格；(ii) 缺乏幾何感知引導，因為生成僅以全局形狀嵌入為條件，而非局部表面線索。我們提出 MeshWeaver，這是一種自回歸框架，將網格生成視為表面編織過程，直接預測下一個頂點而非獨立坐標。其核心為多級稀疏體素編碼器，通過三種互補方式將幾何上下文注入生成過程：提供體素特徵作為頂點表示、通過交叉注意力引導標記預測，以及作為結構支架約束生成圍繞輸入表面。我們的層次化設計能在單次解碼步驟中實現從粗到細的頂點預測，同時將生成模型與 3D 幾何緊密耦合。大量實驗證明，MeshWeaver 實現了 18% 的頂尖壓縮比，能生成高達 16K 面的網格，並顯著提升幾何保真度，優於先前方法。

English

Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in a language-modeling fashion. However, existing approaches suffer from two fundamental limitations: (i) low tokenization efficiency, which yields long token sequences and prevents scaling to high-poly meshes, and (ii) absence of geometry-aware guidance, as generation is conditioned only on global shape embeddings rather than local surface cues. We introduce MeshWeaver, an autoregressive framework that treats mesh generation as a surface weaving process by directly predicting the next vertex instead of independent coordinates. At its core is a multi-level sparse-voxel encoder that injects geometric context into the generative process in three complementary ways: providing voxel features as vertex representations, guiding token prediction via cross-attention to voxel features, and serving as a structural scaffold that constrains generation around the input surface. Our hierarchical design enables coarse-to-fine vertex prediction in a single decoding step, while tightly coupling the generative model with 3D geometry. Extensive experiments demonstrate that MeshWeaver achieves a state-of-the-art compression ratio of 18%, can generate meshes with up to 16K faces, and significantly improves geometric fidelity over prior approaches.