Fast3R: 1回の順伝播で1000枚以上の画像の3D再構築に向けて

要旨

コンピュータビジョンにおける多視点3D再構築は、特に多様な視点にわたる正確でスケーラブルな表現が必要なアプリケーションにおいて、中核的な課題となっています。DUSt3Rなどの現在の主要な手法は、基本的にペアごとに画像を処理し、複数の視点からの再構築には高コストなグローバルアラインメント手法が必要となります。本研究では、多視点における効率的でスケーラブルな3D再構築を実現するDUSt3Rの新しい多視点一般化であるFast 3D Reconstruction（Fast3R）を提案します。Fast3Rは、Transformerベースのアーキテクチャを用いて、多くの視点を並行して処理することで、1回の処理でN枚の画像を前方に進め、反復的なアラインメントの必要性を回避します。カメラポーズ推定と3D再構築に関する幅広い実験を通じて、Fast3Rは最先端のパフォーマンスを示し、推論速度の大幅な向上と誤差蓄積の削減を実現しています。これらの結果により、Fast3Rは再構築精度を損なうことなく、スケーラビリティを向上させる堅牢な多視点アプリケーションの代替手段として確立されています。

English

Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally pairwise approach, processing images in pairs and necessitating costly global alignment procedures to reconstruct from multiple views. In this work, we propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel. Fast3R's Transformer-based architecture forwards N images in a single forward pass, bypassing the need for iterative alignment. Through extensive experiments on camera pose estimation and 3D reconstruction, Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error accumulation. These results establish Fast3R as a robust alternative for multi-view applications, offering enhanced scalability without compromising reconstruction accuracy.

Fast3R: 1回の順伝播で1000枚以上の画像の3D再構築に向けて

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

要旨

Support