MatAnyone: 一貫したメモリ伝播を用いた安定したビデオマッティング

要旨

補助なしの人間のビデオマッティング手法は、入力フレームのみに依存するため、複雑または曖昧な背景に苦労することがよくあります。この課題に対処するために、我々はMatAnyoneを提案します。これは、ターゲット割り当てビデオマッティングに特化した堅牢なフレームワークです。具体的には、メモリベースのパラダイムに基づき、領域適応メモリ融合を介した一貫したメモリ伝播モジュールを導入し、前フレームからメモリを適応的に統合します。これにより、コア領域での意味的安定性を確保しつつ、オブジェクトの境界に沿った細かい詳細を保持します。堅牢なトレーニングのために、ビデオマッティング用の大規模で高品質かつ多様なデータセットを提供します。さらに、大規模なセグメンテーションデータを効率的に活用する革新的なトレーニング戦略を組み込み、マッティングの安定性を向上させます。この新しいネットワーク設計、データセット、トレーニング戦略により、MatAnyoneは多様な実世界シナリオで堅牢かつ正確なビデオマッティング結果を提供し、既存の手法を上回ります。

English

Auxiliary-free human video matting methods, which rely solely on input frames, often struggle with complex or ambiguous backgrounds. To address this, we propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries. For robust training, we present a larger, high-quality, and diverse dataset for video matting. Additionally, we incorporate a novel training strategy that efficiently leverages large-scale segmentation data, boosting matting stability. With this new network design, dataset, and training strategy, MatAnyone delivers robust and accurate video matting results in diverse real-world scenarios, outperforming existing methods.

MatAnyone: 一貫したメモリ伝播を用いた安定したビデオマッティング

MatAnyone: Stable Video Matting with Consistent Memory Propagation

要旨

Support