MOVE: 동작 기반 소샷 비디오 객체 분할

초록

본 연구는 동일한 움직임 패턴을 가진 소수의 주석 예제를 기반으로 동영상 내 동적 객체를 분할하는 것을 목표로 하는 움직임 기반 소수 샷 비디오 객체 분할(FSVOS)을 다룹니다. 기존의 FSVOS 데이터셋과 방법론은 일반적으로 객체 카테고리에 초점을 맞추며, 이는 동영상의 풍부한 시간적 동역학을 무시하는 정적 속성으로, 움직임 이해가 필요한 시나리오에서의 적용을 제한합니다. 이러한 격차를 메우기 위해, 우리는 움직임 기반 FSVOS를 위해 특별히 설계된 대규모 데이터셋인 MOVE를 소개합니다. MOVE를 기반으로, 우리는 2가지 실험 설정에서 3개의 관련 작업에서 6개의 최신 방법론을 종합적으로 평가합니다. 우리의 결과는 현재의 방법론들이 움직임 기반 FSVOS를 해결하는 데 어려움을 겪고 있음을 보여주며, 이와 관련된 도전 과제를 분석하고 기반 방법론인 Decoupled Motion Appearance Network(DMA)를 제안합니다. 실험 결과, 우리의 접근 방식은 소수 샷 움직임 이해에서 우수한 성능을 달성하며, 이 방향의 향후 연구를 위한 견고한 기반을 마련합니다.

English

This work addresses motion-guided few-shot video object segmentation (FSVOS), which aims to segment dynamic objects in videos based on a few annotated examples with the same motion patterns. Existing FSVOS datasets and methods typically focus on object categories, which are static attributes that ignore the rich temporal dynamics in videos, limiting their application in scenarios requiring motion understanding. To fill this gap, we introduce MOVE, a large-scale dataset specifically designed for motion-guided FSVOS. Based on MOVE, we comprehensively evaluate 6 state-of-the-art methods from 3 different related tasks across 2 experimental settings. Our results reveal that current methods struggle to address motion-guided FSVOS, prompting us to analyze the associated challenges and propose a baseline method, Decoupled Motion Appearance Network (DMA). Experiments demonstrate that our approach achieves superior performance in few shot motion understanding, establishing a solid foundation for future research in this direction.

MOVE: 동작 기반 소샷 비디오 객체 분할

MOVE: Motion-Guided Few-Shot Video Object Segmentation

초록

Support