ViSTA-SLAM: 対称的二視点アソシエーションを備えた視覚SLAM

要旨

本論文では、カメラの内部パラメータを必要とせずに動作するリアルタイム単眼視覚SLAMシステムであるViSTA-SLAMを提案する。これにより、多様なカメラ設定に広く適用可能である。システムの中核として、軽量な対称的二視点関連付け（STA）モデルをフロントエンドとして採用し、二つのRGB画像のみから相対的なカメラ姿勢を推定し、局所的なポイントマップを回帰する。この設計により、モデルの複雑さが大幅に削減され、フロントエンドのサイズは最新の手法と比較して35％に抑えられながら、パイプラインで使用される二視点制約の品質が向上する。バックエンドでは、累積ドリフトに対処するためにループクロージャを組み込んだ特別に設計されたSim(3)ポーズグラフを構築する。広範な実験により、本手法がカメラトラッキングと密な3D再構成の品質の両面において、現在の手法と比較して優れた性能を発揮することが実証された。Githubリポジトリ: https://github.com/zhangganlin/vista-slam

English

We present ViSTA-SLAM as a real-time monocular visual SLAM system that operates without requiring camera intrinsics, making it broadly applicable across diverse camera setups. At its core, the system employs a lightweight symmetric two-view association (STA) model as the frontend, which simultaneously estimates relative camera poses and regresses local pointmaps from only two RGB images. This design reduces model complexity significantly, the size of our frontend is only 35\% that of comparable state-of-the-art methods, while enhancing the quality of two-view constraints used in the pipeline. In the backend, we construct a specially designed Sim(3) pose graph that incorporates loop closures to address accumulated drift. Extensive experiments demonstrate that our approach achieves superior performance in both camera tracking and dense 3D reconstruction quality compared to current methods. Github repository: https://github.com/zhangganlin/vista-slam

ViSTA-SLAM: 対称的二視点アソシエーションを備えた視覚SLAM

ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

要旨

Support