MatAnyone：穩定的影片抠像與一致的記憶傳播

摘要

在 AI 領域，僅依賴輸入幀的無輔助人類視頻抠像方法常常難以應對複雜或模糊的背景。為了應對這一問題，我們提出了MatAnyone，這是一個針對目標指定視頻抠像而設的強大框架。具體來說，我們基於基於記憶的範式，引入了一個一致的記憶傳播模塊，通過區域自適應記憶融合，自適應地整合來自上一幀的記憶。這確保了核心區域的語義穩定性，同時保留了對象邊界上的細節。為了進行強大的訓練，我們提出了一個更大、高質量且多樣化的視頻抠像數據集。此外，我們還融入了一種新穎的訓練策略，有效地利用大規模分割數據，提升了抠像的穩定性。通過這種新的網絡設計、數據集和訓練策略，MatAnyone在各種現實場景中提供了強大且準確的視頻抠像結果，勝過現有方法。

English

Auxiliary-free human video matting methods, which rely solely on input frames, often struggle with complex or ambiguous backgrounds. To address this, we propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries. For robust training, we present a larger, high-quality, and diverse dataset for video matting. Additionally, we incorporate a novel training strategy that efficiently leverages large-scale segmentation data, boosting matting stability. With this new network design, dataset, and training strategy, MatAnyone delivers robust and accurate video matting results in diverse real-world scenarios, outperforming existing methods.

MatAnyone：穩定的影片抠像與一致的記憶傳播

MatAnyone: Stable Video Matting with Consistent Memory Propagation

摘要

Support