任意の物体のアモーダル追跡

要旨

アモーダル知覚、すなわち部分的な可視性から完全な物体構造を理解する能力は、乳児にとっても基本的なスキルである。その重要性は、重度に遮蔽された物体を明確に理解することが不可欠な自動運転などのアプリケーションにまで及ぶ。しかし、現代の検出および追跡アルゴリズムは、この重要な能力を見落とすことが多い。これは、ほとんどのデータセットでモーダルなアノテーションが主流であるためかもしれない。アモーダルデータの不足に対処するため、私たちはTAO-Amodalベンチマークを導入し、数千のビデオシーケンスにわたる880の多様なカテゴリを特徴としている。私たちのデータセットには、可視および遮蔽された物体、さらにフレーム外に部分的にある物体に対するアモーダルおよびモーダルなバウンディングボックスが含まれている。物体の永続性を伴うアモーダル追跡を強化するため、軽量なプラグインモジュールであるアモーダルエキスパンダーを活用し、数百のビデオシーケンスでのデータ拡張を伴うファインチューニングを通じて、標準的なモーダルトラッカーをアモーダルなものに変換する。TAO-Amodalにおいて、遮蔽された物体の検出と追跡で3.3％および1.6％の改善を達成した。人物に対する評価では、最先端のモーダルベースラインと比較して2倍の劇的な改善をもたらした。

English

Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of modal annotations in most datasets. To address the scarcity of amodal data, we introduce the TAO-Amodal benchmark, featuring 880 diverse categories in thousands of video sequences. Our dataset includes amodal and modal bounding boxes for visible and occluded objects, including objects that are partially out-of-frame. To enhance amodal tracking with object permanence, we leverage a lightweight plug-in module, the amodal expander, to transform standard, modal trackers into amodal ones through fine-tuning on a few hundred video sequences with data augmentation. We achieve a 3.3\% and 1.6\% improvement on the detection and tracking of occluded objects on TAO-Amodal. When evaluated on people, our method produces dramatic improvements of 2x compared to state-of-the-art modal baselines.

任意の物体のアモーダル追跡

Tracking Any Object Amodally

要旨

Support