視頻中的任意運動分割
Segment Any Motion in Videos
March 28, 2025
作者: Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, Qianqian Wang
cs.AI
摘要
移動物體分割是實現高層次視覺場景理解的關鍵任務,並具有眾多下游應用。人類能夠輕鬆地在視頻中分割移動物體。以往的研究主要依賴於光流來提供運動線索;然而,這種方法由於部分運動、複雜變形、運動模糊和背景干擾等挑戰,往往導致不完美的預測。我們提出了一種新穎的移動物體分割方法,該方法結合了長程軌跡運動線索與基於DINO的語義特徵,並利用SAM2通過迭代提示策略進行像素級掩碼密集化。我們的模型採用時空軌跡注意力和運動-語義解耦嵌入,以優先考慮運動,同時整合語義支持。在多樣化數據集上的廣泛測試展示了最先進的性能,在具有挑戰性的場景和多物體的細粒度分割中表現出色。我們的代碼可在https://motion-seg.github.io/獲取。
English
Moving object segmentation is a crucial task for achieving a high-level
understanding of visual scenes and has numerous downstream applications. Humans
can effortlessly segment moving objects in videos. Previous work has largely
relied on optical flow to provide motion cues; however, this approach often
results in imperfect predictions due to challenges such as partial motion,
complex deformations, motion blur and background distractions. We propose a
novel approach for moving object segmentation that combines long-range
trajectory motion cues with DINO-based semantic features and leverages SAM2 for
pixel-level mask densification through an iterative prompting strategy. Our
model employs Spatio-Temporal Trajectory Attention and Motion-Semantic
Decoupled Embedding to prioritize motion while integrating semantic support.
Extensive testing on diverse datasets demonstrates state-of-the-art
performance, excelling in challenging scenarios and fine-grained segmentation
of multiple objects. Our code is available at https://motion-seg.github.io/.Summary
AI-Generated Summary