CylinderDepth: 多視点一貫性を実現する自己教師あり周囲深度推定のための円筒空間アテンション

要旨

自己教師ありサラウンドビュー深度推定は、最小限に重複する複数画像から、高密度かつ低コストな360°視野の3D知覚を実現する。しかし、既存手法の多くは、重複領域における深度推定値の視点間不一致に悩まされている。この課題を解決するため、我々は較正済み・時間同期型マルチカメラ群向けに、高密度なメートル単位の深度を予測する新しい幾何学誘導型手法を提案する。本手法は、不一致の主な要因である（1）単一画像深度推定における境界領域の受容野の限界、（2）視点間対応点マッチングの困難さ、に着目する。これら2つの問題を、視点を跨いだ受容野の拡大と、視点間注意機構の狭い近傍領域への制限によって緩和する。具体的には、画像固有の特徴量位置を共有円筒面上に写像することで画像間の近傍関係を構築する。円筒面上の位置に基づき、学習を伴わない重み付けによる明示的空間注意機構を適用し、円筒面上の距離に応じて画像横断的に特徴量を集約する。調整された特徴量は各視点の深度マップへとデコードされる。DDAD及びnuScenesデータセットによる評価では、従来の最先端手法と比較して、視点間深度一貫性と深度精度全体の両方が改善された。コードはhttps://abualhanud.github.io/CylinderDepthPageで公開されている。

English

Self-supervised surround-view depth estimation enables dense, low-cost 3D perception with a 360° field of view from multiple minimally overlapping images. Yet, most existing methods suffer from depth estimates that are inconsistent across overlapping images. To address this limitation, we propose a novel geometry-guided method for calibrated, time-synchronized multi-camera rigs that predicts dense metric depth. Our approach targets two main sources of inconsistency: the limited receptive field in border regions of single-image depth estimation, and the difficulty of correspondence matching. We mitigate these two issues by extending the receptive field across views and restricting cross-view attention to a small neighborhood. To this end, we establish the neighborhood relationships between images by mapping the image-specific feature positions onto a shared cylinder. Based on the cylindrical positions, we apply an explicit spatial attention mechanism, with non-learned weighting, that aggregates features across images according to their distances on the cylinder. The modulated features are then decoded into a depth map for each view. Evaluated on the DDAD and nuScenes datasets, our method improves both cross-view depth consistency and overall depth accuracy compared with state-of-the-art approaches. Code is available at https://abualhanud.github.io/CylinderDepthPage.

CylinderDepth: 多視点一貫性を実現する自己教師あり周囲深度推定のための円筒空間アテンション

CylinderDepth: Cylindrical Spatial Attention for Multi-View Consistent Self-Supervised Surround Depth Estimation

要旨

Support