物体深度向上のための背景プロンプティング

要旨

単一画像から物体の深度を推定することは、多くの視覚、ロボティクス、グラフィックスアプリケーションにおいて重要な課題である。しかし、現在の手法では多様なシーンにおける物体の深度を正確に推定することがしばしば困難である。本研究では、学習された背景を用いて入力物体画像を適応させる、シンプルでありながら効果的な「Background Prompting」戦略を提案する。この背景プロンプトは、小規模な合成物体データセットのみを使用して学習される。実画像上で物体の深度を推定する際には、セグメント化された物体を学習された背景プロンプトに配置し、既存の深度ネットワークを実行する。Background Promptingは、深度ネットワークが背景の変動に対して不変となるようにすることで、前景物体に集中することを支援する。さらに、Background Promptingは合成物体画像と実物体画像の間のドメインギャップを最小化し、単純なファインチューニングよりも優れたsim2realの一般化を実現する。複数の合成および実データセットにおける結果は、様々な既存の深度ネットワークにおいて実物体の深度が一貫して改善されることを示している。コードと最適化された背景プロンプトは、https://mbaradad.github.io/depth_prompt で公開されている。

English

Estimating the depth of objects from a single image is a valuable task for many vision, robotics, and graphics applications. However, current methods often fail to produce accurate depth for objects in diverse scenes. In this work, we propose a simple yet effective Background Prompting strategy that adapts the input object image with a learned background. We learn the background prompts only using small-scale synthetic object datasets. To infer object depth on a real image, we place the segmented object into the learned background prompt and run off-the-shelf depth networks. Background Prompting helps the depth networks focus on the foreground object, as they are made invariant to background variations. Moreover, Background Prompting minimizes the domain gap between synthetic and real object images, leading to better sim2real generalization than simple finetuning. Results on multiple synthetic and real datasets demonstrate consistent improvements in real object depths for a variety of existing depth networks. Code and optimized background prompts can be found at: https://mbaradad.github.io/depth_prompt.

物体深度向上のための背景プロンプティング

Background Prompting for Improved Object Depth

要旨

Support