MeshFlow: 等変フローマッチングによるメッシュ生成

要旨

メッシュ表現は3Dシーン表現の中でも最も一般的なものの一つであるが、その直接生成は容易ではない。なぜなら、メッシュ表現には面と頂点の置換不変性といった重要な対称性が内在するからである。MeshFlowは三角形メッシュを三角形スープとして直接生成することを学習し、メッシュを長い自己回帰シーケンスに直列化する必要を排除する。我々は、三角形スープの主要な対称性（面の任意の置換、および各面内の頂点の置換）を尊重する、同変な最適輸送フローマッチングモデルを採用する。この目的に向けて、Diffusion Transformerアーキテクチャに対してシンプルかつ効果的な修正を提案し、所望の同変性を維持しながら速度場をモデル化可能なスケーラブルなネットワークを実現する。さらに、これらの対称性に反する教師信号を排除することで収束を改善する、最適輸送に基づく訓練目的を導入する。MeshFlowは、最先端の自己回帰型メッシュ生成器と同等のメッシュ品質を達成しつつ、推論時に約18倍の高速化を実現する。プロジェクトページは https://qiisun.github.io/MeshFlow/ にある。

English

Meshes are among the most common 3D scene representations, but directly generating meshes is challenging because the representation contains important symmetries, including permutation invariance of faces and vertices. MeshFlow learns to generate triangle meshes directly as triangle soups, avoiding the need to serialize meshes into long autoregressive sequences. We adopt equivariant optimal-transport flow matching models that respect the key symmetries of triangle soups: arbitrary permutations of faces and permutations of the vertices within each face. Toward this goal, we propose a simple yet effective modification to the Diffusion Transformer architecture, resulting in a scalable network capable of modeling a velocity field while maintaining the desired equivariance. We further introduce an optimal-transport-based training objective that improves convergence by eliminating supervision signals that violate these symmetries. MeshFlow achieves mesh quality comparable to state-of-the-art autoregressive mesh generators while providing about an 18times speedup during inference. Project page is at https://qiisun.github.io/MeshFlow/.