MeshWeaver: 稀疏体素引导的表面编织用于自回归网格生成

摘要

自回归网格生成通过将网格标记化为序列并以语言建模方式训练模型而受到关注。然而，现有方法存在两个根本性局限：（i）标记化效率低下，导致生成长序列标记，阻碍其扩展至高多边形网格；（ii）缺乏几何感知引导，生成仅基于全局形状嵌入而非局部表面线索。我们提出MeshWeaver，一种自回归框架，通过直接预测下一个顶点而非独立坐标，将网格生成视为表面编织过程。其核心是多层级稀疏体素编码器，通过三种互补方式将几何上下文注入生成过程：提供体素特征作为顶点表示，通过交叉注意力机制引导标记预测到体素特征，以及作为结构支架约束生成围绕输入表面。我们的层级化设计使得在单次解码步骤中实现从粗到细的顶点预测，同时将生成模型与三维几何紧密耦合。大量实验表明，MeshWeaver实现了18%的最先进压缩比，能够生成最多包含16K面的网格，并且在几何保真度上显著优于先前方法。

English

Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in a language-modeling fashion. However, existing approaches suffer from two fundamental limitations: (i) low tokenization efficiency, which yields long token sequences and prevents scaling to high-poly meshes, and (ii) absence of geometry-aware guidance, as generation is conditioned only on global shape embeddings rather than local surface cues. We introduce MeshWeaver, an autoregressive framework that treats mesh generation as a surface weaving process by directly predicting the next vertex instead of independent coordinates. At its core is a multi-level sparse-voxel encoder that injects geometric context into the generative process in three complementary ways: providing voxel features as vertex representations, guiding token prediction via cross-attention to voxel features, and serving as a structural scaffold that constrains generation around the input surface. Our hierarchical design enables coarse-to-fine vertex prediction in a single decoding step, while tightly coupling the generative model with 3D geometry. Extensive experiments demonstrate that MeshWeaver achieves a state-of-the-art compression ratio of 18%, can generate meshes with up to 16K faces, and significantly improves geometric fidelity over prior approaches.