透過優化反事實來自我監督學習運動概念
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
March 25, 2025
作者: Stefan Stojanov, David Wendt, Seungwoo Kim, Rahul Venkatesh, Kevin Feigelis, Jiajun Wu, Daniel LK Yamins
cs.AI
摘要
估計影片中的運動是一個關鍵的電腦視覺問題,具有許多下游應用,包括可控影片生成和機器人技術。目前的解決方案主要使用合成數據進行訓練,或需要調整特定情境的啟發式方法,這從根本上限制了這些模型在現實世界中的能力。儘管最近在大規模自監督學習從影片中取得了進展,但利用這些表示進行運動估計仍然相對未被充分探索。在本研究中,我們開發了Opt-CWM,這是一種從預訓練的下一幀預測模型中進行流動和遮擋估計的自監督技術。Opt-CWM通過學習優化反事實探針來從基礎影片模型中提取運動信息,避免了在訓練無限制影片輸入時使用固定啟發式方法的需求。我們在無需標記數據的情況下,在真實世界影片的運動估計上達到了最先進的性能。
English
Estimating motion in videos is an essential computer vision problem with many
downstream applications, including controllable video generation and robotics.
Current solutions are primarily trained using synthetic data or require tuning
of situation-specific heuristics, which inherently limits these models'
capabilities in real-world contexts. Despite recent developments in large-scale
self-supervised learning from videos, leveraging such representations for
motion estimation remains relatively underexplored. In this work, we develop
Opt-CWM, a self-supervised technique for flow and occlusion estimation from a
pre-trained next-frame prediction model. Opt-CWM works by learning to optimize
counterfactual probes that extract motion information from a base video model,
avoiding the need for fixed heuristics while training on unrestricted video
inputs. We achieve state-of-the-art performance for motion estimation on
real-world videos while requiring no labeled data.Summary
AI-Generated Summary