MOVE: モーション誘導型Few-Shotビデオオブジェクトセグメンテーション

要旨

本研究は、モーションガイド型Few-Shot Video Object Segmentation（FSVOS）に取り組むものであり、同じモーションパターンを持つ少数の注釈付き例に基づいて、ビデオ内の動的オブジェクトをセグメント化することを目的としています。既存のFSVOSデータセットと手法は、通常、オブジェクトカテゴリに焦点を当てており、これはビデオ内の豊かな時間的ダイナミクスを無視する静的な属性であり、モーション理解を必要とするシナリオでの応用を制限しています。このギャップを埋めるため、我々はモーションガイド型FSVOSに特化した大規模データセット「MOVE」を導入しました。MOVEに基づいて、2つの実験設定で3つの関連タスクから6つの最先端手法を包括的に評価しました。その結果、現在の手法はモーションガイド型FSVOSに対処するのに苦労していることが明らかになり、これに関連する課題を分析し、ベースライン手法としてDecoupled Motion Appearance Network（DMA）を提案しました。実験により、我々のアプローチがFew-Shotモーション理解において優れた性能を発揮し、この方向性の将来の研究のための堅固な基盤を確立することが示されました。

English

This work addresses motion-guided few-shot video object segmentation (FSVOS), which aims to segment dynamic objects in videos based on a few annotated examples with the same motion patterns. Existing FSVOS datasets and methods typically focus on object categories, which are static attributes that ignore the rich temporal dynamics in videos, limiting their application in scenarios requiring motion understanding. To fill this gap, we introduce MOVE, a large-scale dataset specifically designed for motion-guided FSVOS. Based on MOVE, we comprehensively evaluate 6 state-of-the-art methods from 3 different related tasks across 2 experimental settings. Our results reveal that current methods struggle to address motion-guided FSVOS, prompting us to analyze the associated challenges and propose a baseline method, Decoupled Motion Appearance Network (DMA). Experiments demonstrate that our approach achieves superior performance in few shot motion understanding, establishing a solid foundation for future research in this direction.

MOVE: モーション誘導型Few-Shotビデオオブジェクトセグメンテーション

MOVE: Motion-Guided Few-Shot Video Object Segmentation

要旨

Support