ChatPaper.aiChatPaper

Dens3R:三维几何预测的基础模型

Dens3R: A Foundation Model for 3D Geometry Prediction

July 22, 2025
作者: Xianze Fang, Jingnan Gao, Zhe Wang, Zhuo Chen, Xingyu Ren, Jiangjing Lyu, Qiaomu Ren, Zhonglei Yang, Xiaokang Yang, Yichao Yan, Chengfei Lyu
cs.AI

摘要

近期,密集三维重建领域的进展虽显著,但实现精确统一的几何预测仍面临重大挑战。现有方法大多局限于从输入图像中预测单一几何量。然而,深度、表面法线及点云图等几何量本质上是相互关联的,孤立地估计它们往往难以保证一致性,从而限制了预测的准确性和实际应用价值。这促使我们探索一种统一框架,该框架显式地建模不同几何属性间的结构耦合,以实现联合回归。本文中,我们提出了Dens3R,一个专为联合几何密集预测设计的三维基础模型,可广泛适应多种下游任务。Dens3R采用两阶段训练框架,逐步构建一个既具普适性又本质不变的点云图表示。具体而言,我们设计了一个轻量级的共享编码器-解码器骨干网络,并引入位置插值旋转位置编码,在保持表达力的同时增强对高分辨率输入的鲁棒性。通过将图像对匹配特征与本质不变性建模相结合,Dens3R能够准确回归表面法线、深度等多种几何量,实现从单视图到多视图输入的一致几何感知。此外,我们还提出了一种支持几何一致多视图推理的后处理流程。大量实验验证了Dens3R在各类密集三维预测任务中的卓越性能,并凸显了其在更广泛应用中的潜力。
English
Recent advances in dense 3D reconstruction have led to significant progress, yet achieving accurate unified geometric prediction remains a major challenge. Most existing methods are limited to predicting a single geometry quantity from input images. However, geometric quantities such as depth, surface normals, and point maps are inherently correlated, and estimating them in isolation often fails to ensure consistency, thereby limiting both accuracy and practical applicability. This motivates us to explore a unified framework that explicitly models the structural coupling among different geometric properties to enable joint regression. In this paper, we present Dens3R, a 3D foundation model designed for joint geometric dense prediction and adaptable to a wide range of downstream tasks. Dens3R adopts a two-stage training framework to progressively build a pointmap representation that is both generalizable and intrinsically invariant. Specifically, we design a lightweight shared encoder-decoder backbone and introduce position-interpolated rotary positional encoding to maintain expressive power while enhancing robustness to high-resolution inputs. By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities such as surface normals and depth, achieving consistent geometry perception from single-view to multi-view inputs. Additionally, we propose a post-processing pipeline that supports geometrically consistent multi-view inference. Extensive experiments demonstrate the superior performance of Dens3R across various dense 3D prediction tasks and highlight its potential for broader applications.
PDF52August 5, 2025