Prompting de Fondo para Mejorar la Profundidad de Objetos

Resumen

Estimar la profundidad de objetos a partir de una sola imagen es una tarea valiosa para muchas aplicaciones de visión, robótica y gráficos. Sin embargo, los métodos actuales a menudo no logran producir profundidades precisas para objetos en escenas diversas. En este trabajo, proponemos una estrategia simple pero efectiva llamada *Background Prompting* que adapta la imagen del objeto de entrada con un fondo aprendido. Aprendemos los *prompts* de fondo utilizando únicamente conjuntos de datos sintéticos de objetos a pequeña escala. Para inferir la profundidad del objeto en una imagen real, colocamos el objeto segmentado en el *prompt* de fondo aprendido y ejecutamos redes de profundidad estándar. *Background Prompting* ayuda a las redes de profundidad a enfocarse en el objeto en primer plano, ya que se vuelven invariantes a las variaciones del fondo. Además, *Background Prompting* minimiza la brecha de dominio entre imágenes sintéticas y reales de objetos, lo que lleva a una mejor generalización *sim2real* que el simple ajuste fino (*finetuning*). Los resultados en múltiples conjuntos de datos sintéticos y reales demuestran mejoras consistentes en las profundidades de objetos reales para una variedad de redes de profundidad existentes. El código y los *prompts* de fondo optimizados se pueden encontrar en: https://mbaradad.github.io/depth_prompt.

English

Estimating the depth of objects from a single image is a valuable task for many vision, robotics, and graphics applications. However, current methods often fail to produce accurate depth for objects in diverse scenes. In this work, we propose a simple yet effective Background Prompting strategy that adapts the input object image with a learned background. We learn the background prompts only using small-scale synthetic object datasets. To infer object depth on a real image, we place the segmented object into the learned background prompt and run off-the-shelf depth networks. Background Prompting helps the depth networks focus on the foreground object, as they are made invariant to background variations. Moreover, Background Prompting minimizes the domain gap between synthetic and real object images, leading to better sim2real generalization than simple finetuning. Results on multiple synthetic and real datasets demonstrate consistent improvements in real object depths for a variety of existing depth networks. Code and optimized background prompts can be found at: https://mbaradad.github.io/depth_prompt.

Prompting de Fondo para Mejorar la Profundidad de Objetos

Background Prompting for Improved Object Depth

Resumen

Support