MeshWeaver: 자기회귀적 메시 생성을 위한 희소 복셀 유도 표면 직조

초록

자기회귀적 메시 생성은 메시를 시퀀스로 토큰화하고 언어 모델링 방식으로 모델을 훈련시키는 방법으로 주목받아 왔다. 그러나 기존 접근법은 두 가지 근본적인 한계를 가진다: (i) 토큰화 효율이 낮아 긴 토큰 시퀀스를 생성하고 고폴리곤 메시로의 확장을 저해하며, (ii) 기하학 인식 유도가 부재하여 생성 과정이 국소적 표면 단서보다는 전역적 형태 임베딩에만 의존한다는 점이다. 본 논문에서는 메시 생성을 표면 직조 과정으로 간주하여 독립적인 좌표 대신 다음 정점을 직접 예측하는 자기회귀적 프레임워크인 MeshWeaver를 제안한다. 핵심은 다중 수준 희소 복셀 인코더로, 이는 세 가지 상호 보완적 방식으로 생성 과정에 기하학적 맥락을 주입한다: 정점 표현으로서 복셀 특징 제공, 복셀 특징에 대한 교차 주의를 통한 토큰 예측 유도, 입력 표면 주변의 생성을 제약하는 구조적 발판 역할 수행. 본 계층적 설계는 단일 디코딩 단계에서 조-미세 정점 예측을 가능하게 하면서 생성 모델을 3D 기하학과 긴밀하게 결합한다. 광범위한 실험을 통해 MeshWeaver가 18%의 최첨단 압축률을 달성하고, 최대 16K면을 가진 메시를 생성할 수 있으며, 이전 접근법 대비 기하학적 충실도를 크게 향상시킴을 입증한다.

English

Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in a language-modeling fashion. However, existing approaches suffer from two fundamental limitations: (i) low tokenization efficiency, which yields long token sequences and prevents scaling to high-poly meshes, and (ii) absence of geometry-aware guidance, as generation is conditioned only on global shape embeddings rather than local surface cues. We introduce MeshWeaver, an autoregressive framework that treats mesh generation as a surface weaving process by directly predicting the next vertex instead of independent coordinates. At its core is a multi-level sparse-voxel encoder that injects geometric context into the generative process in three complementary ways: providing voxel features as vertex representations, guiding token prediction via cross-attention to voxel features, and serving as a structural scaffold that constrains generation around the input surface. Our hierarchical design enables coarse-to-fine vertex prediction in a single decoding step, while tightly coupling the generative model with 3D geometry. Extensive experiments demonstrate that MeshWeaver achieves a state-of-the-art compression ratio of 18%, can generate meshes with up to 16K faces, and significantly improves geometric fidelity over prior approaches.