모든 객체를 양태적으로 추적하기

초록

부분적으로만 보이는 상태에서도 물체의 완전한 구조를 이해하는 능력인 아모달 지각(amodal perception)은 심지어 유아에게도 기본적인 기술입니다. 이 능력은 자율 주행과 같은 응용 분야에서도 중요한데, 심하게 가려진 물체를 명확히 이해하는 것이 필수적이기 때문입니다. 그러나 현대의 탐지 및 추적 알고리즘은 대부분의 데이터셋에서 모달(modal) 주석이 일반적으로 사용되기 때문에 이 중요한 능력을 종종 간과합니다. 아모달 데이터의 부족을 해결하기 위해, 우리는 880개의 다양한 카테고리를 포함한 수천 개의 비디오 시퀀스로 구성된 TAO-Amodal 벤치마크를 소개합니다. 우리의 데이터셋은 가려진 물체와 부분적으로 프레임 밖에 있는 물체를 포함하여, 가시적 및 비가시적 물체에 대한 아모달 및 모달 바운딩 박스를 제공합니다. 물체의 영속성(object permanence)을 강화한 아모달 추적을 위해, 우리는 경량 플러그인 모듈인 아모달 확장기(amodal expander)를 활용하여, 데이터 증강을 적용한 수백 개의 비디오 시퀀스에 대한 미세 조정(fine-tuning)을 통해 표준 모달 추적기를 아모달 추적기로 변환합니다. 이를 통해 TAO-Amodal에서 가려진 물체의 탐지 및 추적 성능이 각각 3.3%와 1.6% 향상되었습니다. 사람에 대해 평가할 때, 우리의 방법은 최신 모달 기준선(state-of-the-art modal baselines)에 비해 2배의 극적인 개선을 보여줍니다.

English

Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of modal annotations in most datasets. To address the scarcity of amodal data, we introduce the TAO-Amodal benchmark, featuring 880 diverse categories in thousands of video sequences. Our dataset includes amodal and modal bounding boxes for visible and occluded objects, including objects that are partially out-of-frame. To enhance amodal tracking with object permanence, we leverage a lightweight plug-in module, the amodal expander, to transform standard, modal trackers into amodal ones through fine-tuning on a few hundred video sequences with data augmentation. We achieve a 3.3\% and 1.6\% improvement on the detection and tracking of occluded objects on TAO-Amodal. When evaluated on people, our method produces dramatic improvements of 2x compared to state-of-the-art modal baselines.

모든 객체를 양태적으로 추적하기

Tracking Any Object Amodally

초록

Support