ChatPaper.aiChatPaper

視頻中的任意運動分割

Segment Any Motion in Videos

March 28, 2025
作者: Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, Qianqian Wang
cs.AI

摘要

移動物體分割是實現高層次視覺場景理解的關鍵任務,並具有眾多下游應用。人類能夠輕鬆地在視頻中分割移動物體。以往的研究主要依賴於光流來提供運動線索;然而,這種方法由於部分運動、複雜變形、運動模糊和背景干擾等挑戰,往往導致不完美的預測。我們提出了一種新穎的移動物體分割方法,該方法結合了長程軌跡運動線索與基於DINO的語義特徵,並利用SAM2通過迭代提示策略進行像素級掩碼密集化。我們的模型採用時空軌跡注意力和運動-語義解耦嵌入,以優先考慮運動,同時整合語義支持。在多樣化數據集上的廣泛測試展示了最先進的性能,在具有挑戰性的場景和多物體的細粒度分割中表現出色。我們的代碼可在https://motion-seg.github.io/獲取。
English
Moving object segmentation is a crucial task for achieving a high-level understanding of visual scenes and has numerous downstream applications. Humans can effortlessly segment moving objects in videos. Previous work has largely relied on optical flow to provide motion cues; however, this approach often results in imperfect predictions due to challenges such as partial motion, complex deformations, motion blur and background distractions. We propose a novel approach for moving object segmentation that combines long-range trajectory motion cues with DINO-based semantic features and leverages SAM2 for pixel-level mask densification through an iterative prompting strategy. Our model employs Spatio-Temporal Trajectory Attention and Motion-Semantic Decoupled Embedding to prioritize motion while integrating semantic support. Extensive testing on diverse datasets demonstrates state-of-the-art performance, excelling in challenging scenarios and fine-grained segmentation of multiple objects. Our code is available at https://motion-seg.github.io/.

Summary

AI-Generated Summary

PDF172March 31, 2025