Dens3R:三維幾何預測的基礎模型
Dens3R: A Foundation Model for 3D Geometry Prediction
July 22, 2025
作者: Xianze Fang, Jingnan Gao, Zhe Wang, Zhuo Chen, Xingyu Ren, Jiangjing Lyu, Qiaomu Ren, Zhonglei Yang, Xiaokang Yang, Yichao Yan, Chengfei Lyu
cs.AI
摘要
近期在密集三維重建領域的進展已帶來顯著進步,然而實現精確的統一幾何預測仍是一大挑戰。現有方法大多僅限於從輸入圖像中預測單一幾何量。然而,深度、表面法線和點雲圖等幾何量本質上是相互關聯的,孤立地估計它們往往無法確保一致性,從而限制了準確性和實際應用性。這促使我們探索一個統一框架,該框架顯式地建模不同幾何屬性之間的結構耦合,以實現聯合回歸。本文中,我們提出了Dens3R,這是一個專為聯合幾何密集預測設計的三維基礎模型,並可適應多種下游任務。Dens3R採用兩階段訓練框架,逐步構建一個既具通用性又內在不變的點雲圖表示。具體而言,我們設計了一個輕量級的共享編碼器-解碼器骨幹,並引入了位置插值的旋轉位置編碼,以在保持表達能力的同時增強對高分辨率輸入的魯棒性。通過將圖像對匹配特徵與內在不變性建模相結合,Dens3R能夠精確回歸多個幾何量,如表面法線和深度,實現從單視圖到多視圖輸入的一致幾何感知。此外,我們提出了一個後處理流程,支持幾何一致的多視圖推理。大量實驗證明了Dens3R在各種密集三維預測任務中的優越性能,並凸顯了其在更廣泛應用中的潛力。
English
Recent advances in dense 3D reconstruction have led to significant progress,
yet achieving accurate unified geometric prediction remains a major challenge.
Most existing methods are limited to predicting a single geometry quantity from
input images. However, geometric quantities such as depth, surface normals, and
point maps are inherently correlated, and estimating them in isolation often
fails to ensure consistency, thereby limiting both accuracy and practical
applicability. This motivates us to explore a unified framework that explicitly
models the structural coupling among different geometric properties to enable
joint regression. In this paper, we present Dens3R, a 3D foundation model
designed for joint geometric dense prediction and adaptable to a wide range of
downstream tasks. Dens3R adopts a two-stage training framework to progressively
build a pointmap representation that is both generalizable and intrinsically
invariant. Specifically, we design a lightweight shared encoder-decoder
backbone and introduce position-interpolated rotary positional encoding to
maintain expressive power while enhancing robustness to high-resolution inputs.
By integrating image-pair matching features with intrinsic invariance modeling,
Dens3R accurately regresses multiple geometric quantities such as surface
normals and depth, achieving consistent geometry perception from single-view to
multi-view inputs. Additionally, we propose a post-processing pipeline that
supports geometrically consistent multi-view inference. Extensive experiments
demonstrate the superior performance of Dens3R across various dense 3D
prediction tasks and highlight its potential for broader applications.