개선된 객체 깊이를 위한 배경 프롬프팅

초록

단일 이미지에서 물체의 깊이를 추정하는 것은 다양한 비전, 로보틱스, 그래픽스 응용 분야에서 중요한 과제입니다. 그러나 현재의 방법들은 다양한 장면에서 물체의 정확한 깊이를 생성하는 데 실패하는 경우가 많습니다. 본 연구에서는 학습된 배경을 통해 입력 물체 이미지를 적응시키는 간단하면서도 효과적인 배경 프롬프트 전략을 제안합니다. 우리는 소규모 합성 물체 데이터셋만을 사용하여 배경 프롬프트를 학습합니다. 실제 이미지에서 물체의 깊이를 추론하기 위해, 분할된 물체를 학습된 배경 프롬프트에 배치하고 기존의 깊이 네트워크를 실행합니다. 배경 프롬프트는 깊이 네트워크가 배경 변화에 불변하도록 만들어 전경 물체에 집중할 수 있도록 돕습니다. 또한, 배경 프롬프트는 합성과 실제 물체 이미지 간의 도메인 격차를 최소화하여 단순한 파인튜닝보다 더 나은 시뮬레이션-투-리얼(sim2real) 일반화를 이끌어냅니다. 여러 합성 및 실제 데이터셋에 대한 실험 결과는 다양한 기존 깊이 네트워크에서 실제 물체 깊이의 일관된 개선을 보여줍니다. 코드와 최적화된 배경 프롬프트는 https://mbaradad.github.io/depth_prompt에서 확인할 수 있습니다.

English

Estimating the depth of objects from a single image is a valuable task for many vision, robotics, and graphics applications. However, current methods often fail to produce accurate depth for objects in diverse scenes. In this work, we propose a simple yet effective Background Prompting strategy that adapts the input object image with a learned background. We learn the background prompts only using small-scale synthetic object datasets. To infer object depth on a real image, we place the segmented object into the learned background prompt and run off-the-shelf depth networks. Background Prompting helps the depth networks focus on the foreground object, as they are made invariant to background variations. Moreover, Background Prompting minimizes the domain gap between synthetic and real object images, leading to better sim2real generalization than simple finetuning. Results on multiple synthetic and real datasets demonstrate consistent improvements in real object depths for a variety of existing depth networks. Code and optimized background prompts can be found at: https://mbaradad.github.io/depth_prompt.

개선된 객체 깊이를 위한 배경 프롬프팅

Background Prompting for Improved Object Depth

초록

Support