非模態物體追蹤
Tracking Any Object Amodally
December 19, 2023
作者: Cheng-Yen Hsieh, Tarasha Khurana, Achal Dave, Deva Ramanan
cs.AI
摘要
非物感知是一種能力,能夠從部分可見性中理解完整的物體結構,這是一種基本技能,即使對於嬰兒也是如此。它的重要性延伸至應用領域,如自動駕駛,在那裡對於深度遮擋物體的清晰理解至關重要。然而,現代檢測和追踪算法通常忽略了這一關鍵能力,或許是因為大多數數據集中普遍存在的物感標註。為了解決非物感知數據的稀缺性,我們引入了TAO-非物感知基準,其中包含數千個視頻序列中的880個多樣類別。我們的數據集包括可見和遮擋物體的非物和物邊界框,包括部分超出畫面的物體。為了通過數據擴增在幾百個視頻序列上微調,利用一個輕量級的插件模塊,即非物擴展器,來增強帶有物體恒常性的非物追踪。我們在TAO-非物感知上實現了檢測和追踪遮擋物體的3.3\%和1.6\%的改進。在對人進行評估時,我們的方法與最先進的物感基準相比,產生了2倍的顯著改進。
English
Amodal perception, the ability to comprehend complete object structures from
partial visibility, is a fundamental skill, even for infants. Its significance
extends to applications like autonomous driving, where a clear understanding of
heavily occluded objects is essential. However, modern detection and tracking
algorithms often overlook this critical capability, perhaps due to the
prevalence of modal annotations in most datasets. To address the scarcity of
amodal data, we introduce the TAO-Amodal benchmark, featuring 880 diverse
categories in thousands of video sequences. Our dataset includes amodal and
modal bounding boxes for visible and occluded objects, including objects that
are partially out-of-frame. To enhance amodal tracking with object permanence,
we leverage a lightweight plug-in module, the amodal expander, to transform
standard, modal trackers into amodal ones through fine-tuning on a few hundred
video sequences with data augmentation. We achieve a 3.3\% and 1.6\%
improvement on the detection and tracking of occluded objects on TAO-Amodal.
When evaluated on people, our method produces dramatic improvements of 2x
compared to state-of-the-art modal baselines.