Invite contextuelle pour l'amélioration de la profondeur des objets

Résumé

L'estimation de la profondeur des objets à partir d'une seule image est une tâche cruciale pour de nombreuses applications en vision par ordinateur, robotique et graphismes. Cependant, les méthodes actuelles échouent souvent à produire des estimations précises de la profondeur pour les objets dans des scènes variées. Dans ce travail, nous proposons une stratégie simple mais efficace appelée "Background Prompting" qui adapte l'image de l'objet en y intégrant un arrière-plan appris. Nous apprenons ces arrière-plans uniquement à partir de petits ensembles de données synthétiques d'objets. Pour inférer la profondeur d'un objet sur une image réelle, nous plaçons l'objet segmenté dans l'arrière-plan appris et utilisons des réseaux de profondeur standards. Le Background Prompting aide ces réseaux à se concentrer sur l'objet au premier plan, en les rendant invariants aux variations de l'arrière-plan. De plus, cette stratégie réduit l'écart de domaine entre les images synthétiques et réelles, permettant une meilleure généralisation sim2real qu'un simple ajustement fin. Les résultats sur plusieurs ensembles de données synthétiques et réels montrent des améliorations constantes dans l'estimation de la profondeur des objets pour divers réseaux de profondeur existants. Le code et les arrière-plans optimisés sont disponibles à l'adresse suivante : https://mbaradad.github.io/depth_prompt.

English

Estimating the depth of objects from a single image is a valuable task for many vision, robotics, and graphics applications. However, current methods often fail to produce accurate depth for objects in diverse scenes. In this work, we propose a simple yet effective Background Prompting strategy that adapts the input object image with a learned background. We learn the background prompts only using small-scale synthetic object datasets. To infer object depth on a real image, we place the segmented object into the learned background prompt and run off-the-shelf depth networks. Background Prompting helps the depth networks focus on the foreground object, as they are made invariant to background variations. Moreover, Background Prompting minimizes the domain gap between synthetic and real object images, leading to better sim2real generalization than simple finetuning. Results on multiple synthetic and real datasets demonstrate consistent improvements in real object depths for a variety of existing depth networks. Code and optimized background prompts can be found at: https://mbaradad.github.io/depth_prompt.

Invite contextuelle pour l'amélioration de la profondeur des objets

Background Prompting for Improved Object Depth

Résumé

Support