運動概念の自己教師あり学習：反事実最適化によるアプローチ

要旨

ビデオにおける動きの推定は、制御可能なビデオ生成やロボティクスを含む多くの下流アプリケーションにとって重要なコンピュータビジョンの課題です。現在の解決策は、主に合成データを使用して訓練されるか、状況固有のヒューリスティックの調整を必要とし、これらは本質的に現実世界の文脈におけるモデルの能力を制限しています。ビデオからの大規模な自己教師あり学習の最近の進展にもかかわらず、そのような表現を動き推定に活用することは比較的未開拓のままです。本研究では、事前訓練された次フレーム予測モデルからフローとオクルージョンを推定するための自己教師あり技術であるOpt-CWMを開発します。Opt-CWMは、ベースのビデオモデルから動き情報を抽出する反事実的プローブを最適化することを学習することで動作し、固定されたヒューリスティックを必要とせずに制限のないビデオ入力を訓練します。ラベル付きデータを必要とせずに、現実世界のビデオにおける動き推定で最先端の性能を達成します。

English

Estimating motion in videos is an essential computer vision problem with many downstream applications, including controllable video generation and robotics. Current solutions are primarily trained using synthetic data or require tuning of situation-specific heuristics, which inherently limits these models' capabilities in real-world contexts. Despite recent developments in large-scale self-supervised learning from videos, leveraging such representations for motion estimation remains relatively underexplored. In this work, we develop Opt-CWM, a self-supervised technique for flow and occlusion estimation from a pre-trained next-frame prediction model. Opt-CWM works by learning to optimize counterfactual probes that extract motion information from a base video model, avoiding the need for fixed heuristics while training on unrestricted video inputs. We achieve state-of-the-art performance for motion estimation on real-world videos while requiring no labeled data.

運動概念の自己教師あり学習：反事実最適化によるアプローチ

Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals

要旨

Support