MOVE:运动引导的少样本视频目标分割
MOVE: Motion-Guided Few-Shot Video Object Segmentation
July 29, 2025
作者: Kaining Ying, Hengrui Hu, Henghui Ding
cs.AI
摘要
本研究致力于解决运动引导的少样本视频目标分割(FSVOS)问题,其目标是根据少量具有相同运动模式的标注示例,对视频中的动态目标进行分割。现有的FSVOS数据集和方法通常聚焦于目标类别这一静态属性,忽视了视频中丰富的时序动态信息,限制了其在需要理解运动场景中的应用。为填补这一空白,我们引入了MOVE,一个专为运动引导FSVOS设计的大规模数据集。基于MOVE,我们在两种实验设置下全面评估了来自三个不同相关任务的六种先进方法。结果表明,现有方法在应对运动引导FSVOS时面临挑战,这促使我们深入分析相关难题,并提出了一种基线方法——解耦运动外观网络(DMA)。实验证明,我们的方法在少样本运动理解上表现出色,为未来该方向的研究奠定了坚实基础。
English
This work addresses motion-guided few-shot video object segmentation (FSVOS),
which aims to segment dynamic objects in videos based on a few annotated
examples with the same motion patterns. Existing FSVOS datasets and methods
typically focus on object categories, which are static attributes that ignore
the rich temporal dynamics in videos, limiting their application in scenarios
requiring motion understanding. To fill this gap, we introduce MOVE, a
large-scale dataset specifically designed for motion-guided FSVOS. Based on
MOVE, we comprehensively evaluate 6 state-of-the-art methods from 3 different
related tasks across 2 experimental settings. Our results reveal that current
methods struggle to address motion-guided FSVOS, prompting us to analyze the
associated challenges and propose a baseline method, Decoupled Motion
Appearance Network (DMA). Experiments demonstrate that our approach achieves
superior performance in few shot motion understanding, establishing a solid
foundation for future research in this direction.