在野外通过容器和遮挡物进行跟踪

摘要

在拥挤且动态的环境中持续追踪物体对计算机视觉系统仍然是一个艰巨的挑战。本文介绍了TCOW，一个针对在高度遮挡和包围中进行视觉跟踪的新基准和模型。我们设定了一个任务，即在给定视频序列的情况下，分割目标物体的投影范围，以及当存在容器或遮挡物时，也要对其进行分割。为了研究这一任务，我们创建了一组混合的合成和标注真实数据集，以支持监督学习和模型在各种任务变化形式下的结构化评估，如移动或嵌套包围。我们评估了两种最近基于Transformer的视频模型，并发现虽然它们在某些任务变化设置下可以出人意料地追踪目标，但在我们宣称跟踪模型已经获得了真正的物体持久性概念之前，仍然存在相当大的性能差距。

English

Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems. In this paper, we introduce TCOW, a new benchmark and model for visual tracking through heavy occlusion and containment. We set up a task where the goal is to, given a video sequence, segment both the projected extent of the target object, as well as the surrounding container or occluder whenever one exists. To study this task, we create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance under various forms of task variation, such as moving or nested containment. We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.

在野外通过容器和遮挡物进行跟踪

Tracking through Containers and Occluders in the Wild

摘要

Support