π^3：可擴展的置換等變視覺幾何學習

摘要

我們介紹了pi^3，這是一種前饋神經網絡，它提供了一種新穎的視覺幾何重建方法，打破了對傳統固定參考視角的依賴。以往的方法通常將重建結果錨定在指定的視點上，這種歸納偏置在參考視點不理想時可能導致不穩定和失敗。與之相反，pi^3採用了一種完全置換等變的架構來預測仿射不變的相機姿態和尺度不變的局部點雲圖，而無需任何參考框架。這一設計使我們的模型對輸入順序具有內在的魯棒性，並且具有高度的可擴展性。這些優勢使得我們這種簡單且無偏置的方法在多種任務上達到了最先進的性能，包括相機姿態估計、單目/視頻深度估計以及密集點雲圖重建。代碼和模型均已公開提供。

English

We introduce pi^3, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, pi^3 employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation, monocular/video depth estimation, and dense point map reconstruction. Code and models are publicly available.

π^3：可擴展的置換等變視覺幾何學習

π^3: Scalable Permutation-Equivariant Visual Geometry Learning

摘要

Support