Dens3R: 3Dジオメトリ予測のための基盤モデル

要旨

高密度3D再構成の最近の進展により、大きな進歩が見られたものの、正確で統一的な幾何学的予測を達成することは依然として主要な課題である。既存の手法の多くは、入力画像から単一の幾何学的量を予測することに限定されている。しかし、深度、表面法線、ポイントマップなどの幾何学的量は本質的に関連しており、それらを個別に推定することはしばしば一貫性を保証できず、精度と実用性の両方を制限する。これにより、異なる幾何学的特性間の構造的結合を明示的にモデル化し、共同回帰を可能にする統一フレームワークを探求する動機が生まれる。本論文では、共同幾何学的高密度予測のための3D基盤モデルであり、幅広い下流タスクに適応可能なDens3Rを紹介する。Dens3Rは、汎用性があり本質的に不変なポイントマップ表現を段階的に構築するための2段階のトレーニングフレームワークを採用している。具体的には、軽量な共有エンコーダ-デコーダバックボーンを設計し、高解像度入力に対する堅牢性を向上させながら表現力を維持するために、位置補間回転位置エンコーディングを導入する。画像ペアマッチング特徴と本質的不変性モデリングを統合することで、Dens3Rは表面法線や深度などの複数の幾何学的量を正確に回帰し、単一視点から多視点入力にわたる一貫した幾何学的知覚を実現する。さらに、幾何学的に一貫した多視点推論をサポートする後処理パイプラインを提案する。広範な実験により、Dens3Rがさまざまな高密度3D予測タスクで優れた性能を発揮し、より広範な応用の可能性を示している。

English

Recent advances in dense 3D reconstruction have led to significant progress, yet achieving accurate unified geometric prediction remains a major challenge. Most existing methods are limited to predicting a single geometry quantity from input images. However, geometric quantities such as depth, surface normals, and point maps are inherently correlated, and estimating them in isolation often fails to ensure consistency, thereby limiting both accuracy and practical applicability. This motivates us to explore a unified framework that explicitly models the structural coupling among different geometric properties to enable joint regression. In this paper, we present Dens3R, a 3D foundation model designed for joint geometric dense prediction and adaptable to a wide range of downstream tasks. Dens3R adopts a two-stage training framework to progressively build a pointmap representation that is both generalizable and intrinsically invariant. Specifically, we design a lightweight shared encoder-decoder backbone and introduce position-interpolated rotary positional encoding to maintain expressive power while enhancing robustness to high-resolution inputs. By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities such as surface normals and depth, achieving consistent geometry perception from single-view to multi-view inputs. Additionally, we propose a post-processing pipeline that supports geometrically consistent multi-view inference. Extensive experiments demonstrate the superior performance of Dens3R across various dense 3D prediction tasks and highlight its potential for broader applications.

Dens3R: 3Dジオメトリ予測のための基盤モデル

Dens3R: A Foundation Model for 3D Geometry Prediction

要旨

Support