야생 환경에서의 컨테이너 및 가림막을 통한 추적

초록

복잡하고 동적인 환경에서 지속적으로 객체를 추적하는 것은 컴퓨터 비전 시스템에게 여전히 어려운 과제로 남아 있다. 본 논문에서는 심각한 폐색(occlusion)과 포함(containment) 상황을 통한 시각적 추적을 위한 새로운 벤치마크 및 모델인 TCOW를 소개한다. 우리는 비디오 시퀀스가 주어졌을 때, 목표 객체의 투영된 범위와 주변의 컨테이너 또는 폐색물(occluder)이 존재할 경우 이를 분할하는 작업을 설정하였다. 이 작업을 연구하기 위해, 우리는 합성 데이터와 주석이 달린 실제 데이터를 혼합하여 다양한 형태의 작업 변형(예: 이동 중이거나 중첩된 포함) 하에서 모델 성능의 구조적 평가와 지도 학습을 지원하는 데이터셋을 구축하였다. 최근의 트랜스포머(transformer) 기반 비디오 모델 두 가지를 평가한 결과, 특정 작업 변형 설정 하에서는 목표를 추적하는 데 놀라울 정도로 능력이 있음이 확인되었지만, 객체 영속성(object permanence)에 대한 진정한 개념을 획득했다고 주장할 수 있을 만큼의 성능 격차가 여전히 존재함을 발견하였다.

English

Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems. In this paper, we introduce TCOW, a new benchmark and model for visual tracking through heavy occlusion and containment. We set up a task where the goal is to, given a video sequence, segment both the projected extent of the target object, as well as the surrounding container or occluder whenever one exists. To study this task, we create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance under various forms of task variation, such as moving or nested containment. We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.

야생 환경에서의 컨테이너 및 가림막을 통한 추적

Tracking through Containers and Occluders in the Wild

초록

Support