通过优化反事实实现运动概念的自我监督学习

摘要

视频中的运动估计是计算机视觉领域的一个核心问题，具有广泛的下游应用，包括可控视频生成和机器人技术。当前的解决方案主要依赖于合成数据进行训练，或需要针对特定情境调整启发式规则，这本质上限制了这些模型在现实世界场景中的能力。尽管近年来在大规模视频自监督学习方面取得了进展，但如何利用这些表示进行运动估计仍相对未被充分探索。在本研究中，我们开发了Opt-CWM，一种基于预训练下一帧预测模型的自监督技术，用于光流和遮挡估计。Opt-CWM通过学习优化反事实探针，从基础视频模型中提取运动信息，从而避免了固定启发式规则的需求，并能在无限制的视频输入上进行训练。我们在无需标注数据的情况下，实现了对真实世界视频运动估计的最先进性能。

English

Estimating motion in videos is an essential computer vision problem with many downstream applications, including controllable video generation and robotics. Current solutions are primarily trained using synthetic data or require tuning of situation-specific heuristics, which inherently limits these models' capabilities in real-world contexts. Despite recent developments in large-scale self-supervised learning from videos, leveraging such representations for motion estimation remains relatively underexplored. In this work, we develop Opt-CWM, a self-supervised technique for flow and occlusion estimation from a pre-trained next-frame prediction model. Opt-CWM works by learning to optimize counterfactual probes that extract motion information from a base video model, avoiding the need for fixed heuristics while training on unrestricted video inputs. We achieve state-of-the-art performance for motion estimation on real-world videos while requiring no labeled data.

通过优化反事实实现运动概念的自我监督学习

Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals

摘要

Support