iLRM: 反復型大規模3D再構成モデル

要旨

フィードフォワード型3Dモデリングは、高速かつ高品質な3D再構築の有望なアプローチとして注目を集めています。特に、3Dガウシアンスプラッティングのような明示的な3D表現を直接生成する手法は、高速で高品質なレンダリングと多数の応用可能性から、大きな関心を集めています。しかし、多くの最先端の手法、特にトランスフォーマーアーキテクチャに基づくものは、複数の入力ビューからの画像トークン間の完全なアテンションに依存しているため、ビュー数や画像解像度が増加するにつれて計算コストが急増し、スケーラビリティに深刻な問題を抱えています。スケーラブルで効率的なフィードフォワード型3D再構築を目指して、我々は反復的な大規模3D再構築モデル（iLRM）を提案します。このモデルは、3つの核心原則に基づいて、反復的な精緻化メカニズムを通じて3Dガウシアン表現を生成します：(1) シーン表現を入力ビュー画像から切り離し、コンパクトな3D表現を可能にする、(2) 完全なアテンション型のマルチビュー相互作用を2段階のアテンションスキームに分解して計算コストを削減する、(3) 高解像度情報を各層に注入して高忠実度の再構築を実現する。RE10KやDL3DVなどの広く使用されているデータセットでの実験結果は、iLRMが再構築品質と速度の両面で既存の手法を上回ることを示しています。特に、iLRMは優れたスケーラビリティを発揮し、より多くの入力ビューを効率的に活用することで、同等の計算コストで大幅に高い再構築品質を実現します。

English

Feed-forward 3D modeling has emerged as a promising approach for rapid and high-quality 3D reconstruction. In particular, directly generating explicit 3D representations, such as 3D Gaussian splatting, has attracted significant attention due to its fast and high-quality rendering, as well as numerous applications. However, many state-of-the-art methods, primarily based on transformer architectures, suffer from severe scalability issues because they rely on full attention across image tokens from multiple input views, resulting in prohibitive computational costs as the number of views or image resolution increases. Toward a scalable and efficient feed-forward 3D reconstruction, we introduce an iterative Large 3D Reconstruction Model (iLRM) that generates 3D Gaussian representations through an iterative refinement mechanism, guided by three core principles: (1) decoupling the scene representation from input-view images to enable compact 3D representations; (2) decomposing fully-attentional multi-view interactions into a two-stage attention scheme to reduce computational costs; and (3) injecting high-resolution information at every layer to achieve high-fidelity reconstruction. Experimental results on widely used datasets, such as RE10K and DL3DV, demonstrate that iLRM outperforms existing methods in both reconstruction quality and speed. Notably, iLRM exhibits superior scalability, delivering significantly higher reconstruction quality under comparable computational cost by efficiently leveraging a larger number of input views.

iLRM: 反復型大規模3D再構成モデル

iLRM: An Iterative Large 3D Reconstruction Model

要旨

Support