非模态追踪任意对象

摘要

Amodal感知是一种能力，即从部分可见性中理解完整物体结构，即使对婴儿而言也是一项基本技能。其重要性延伸至应用领域，如自动驾驶，在这些应用中，对于严重遮挡的物体有清晰的理解是至关重要的。然而，现代检测和跟踪算法通常忽视了这一关键能力，也许是因为大多数数据集中普遍存在模态注释。为解决amodal数据的稀缺性，我们引入了TAO-Amodal基准，其中包含数千个视频序列中的880个不同类别。我们的数据集包括可见和被遮挡物体的amodal和modal边界框，包括部分超出画面范围的物体。为了通过数据增强在几百个视频序列上微调，利用轻量级插件模块amodal扩展器来增强amodal跟踪的物体恒常性。我们在TAO-Amodal上实现了检测和跟踪被遮挡物体的3.3%和1.6%的改进。在人员评估时，与最先进的模态基线相比，我们的方法实现了2倍的显著改进。

English

Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of modal annotations in most datasets. To address the scarcity of amodal data, we introduce the TAO-Amodal benchmark, featuring 880 diverse categories in thousands of video sequences. Our dataset includes amodal and modal bounding boxes for visible and occluded objects, including objects that are partially out-of-frame. To enhance amodal tracking with object permanence, we leverage a lightweight plug-in module, the amodal expander, to transform standard, modal trackers into amodal ones through fine-tuning on a few hundred video sequences with data augmentation. We achieve a 3.3\% and 1.6\% improvement on the detection and tracking of occluded objects on TAO-Amodal. When evaluated on people, our method produces dramatic improvements of 2x compared to state-of-the-art modal baselines.

非模态追踪任意对象

Tracking Any Object Amodally

摘要

Support