π^3：可扩展的置换等变视觉几何学习

摘要

我们提出了pi^3，一种前馈神经网络，它为视觉几何重建提供了一种新颖的方法，打破了传统固定参考视角的依赖。以往的方法通常将重建过程锚定在指定的视点上，这种归纳偏差可能导致在参考视角不理想时出现不稳定和失败。相比之下，pi^3采用了一种完全置换等变的架构，无需任何参考框架即可预测仿射不变的相机姿态和尺度不变的局部点云图。这一设计使得我们的模型对输入顺序具有内在的鲁棒性，并具备高度的可扩展性。这些优势使得我们这种简单且无偏差的方法在相机姿态估计、单目/视频深度估计以及密集点云图重建等一系列任务中实现了最先进的性能。代码和模型均已公开提供。

English

We introduce pi^3, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, pi^3 employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation, monocular/video depth estimation, and dense point map reconstruction. Code and models are publicly available.

π^3：可扩展的置换等变视觉几何学习

π^3: Scalable Permutation-Equivariant Visual Geometry Learning

摘要

Support