SA4Depth: 自己教師あり単眼深度推定のための一貫したポーズ・深度スケール調整

要旨

自己教師ありの単眼シーケンスからの深度推定は、深度ネットワークとポーズネットワークの共同学習に依存している。深度ネットワークを改善するための研究は豊富に行われているが、ポーズネットワークに関する取り組みは依然として限定的である。このような背景において、深度がスケールまで推定される場合でも、ポーズネットワークと深度ネットワークによって推定されるシーンスケール間の整合性の重要性を強調する。そこで、本論文ではSA4Depthを提案する。これは、推論時間を変えずにこの整合性を改善し、深度予測を向上させる手法である。提案手法では、学習中に推定された深度を用いて、連続フレーム間で学習可能な視覚的特徴を再投影し、特徴の整合残差を低減することでポーズ推定を洗練する。本手法により、独立した深度ネットワークとポーズネットワークによって推定されるシーンスケールが整合され、異なるシーケンス間での予測スケールの一貫性が向上する。本提案の微分可能な洗練手法は、既存の自己教師ありパイプラインにシームレスに統合され、その深度推定を大幅に改善する。これを、KITTI、Cityscapes、NYUv2を用いた屋外および屋内での広範な実験により実証する。さらに、KITTI Odometryでの結果は、ポーズ洗練の有効性を確認するものである。コードはhttps://github.com/Runningchauncey/SA4Depthで公開している。

English

Self-supervised depth estimation from monocular sequences relies on the joint learning of a depth and a pose network. Despite abundant research done to improve the depth network, efforts on the pose remain limited. In this context, even when depth is estimated up to scale, we highlight the importance of the alignment between the scene scales estimated by the pose and depth nets. Then, we introduce SA4Depth, an approach to improve this alignment and boost the depth predictions while keeping the inference time unchanged. Our proposed method uses the depth estimated during training to reproject learnable visual features across consecutive frames and refine the pose estimates by reducing feature alignment residuals. With our method, the estimated scene scales by the separate depth and pose networks are aligned, and the prediction scale consistency is improved across different sequences. Our differentiable refinement integrates seamlessly into existing self-supervised pipelines and substantially improves their depth estimates. We demonstrate this with extensive experiments both outdoors and indoors on KITTI, Cityscapes, and NYUv2. Additionally, results on KITTI Odometry confirm the effectiveness of our pose refinement. Our code is available at https://github.com/Runningchauncey/SA4Depth .

SA4Depth: 自己教師あり単眼深度推定のための一貫したポーズ・深度スケール調整

SA4Depth: Consistent Pose-Depth Scale Alignment for Self-Supervised Monocular Depth Estimation

要旨

Support