운동 개념의 자기 지도 학습: 반사실적 최적화를 통한 접근

초록

비디오에서의 움직임 추정은 제어 가능한 비디오 생성 및 로보틱스를 포함한 다양한 하위 응용 분야에서 필수적인 컴퓨터 비전 문제입니다. 현재의 솔루션은 주로 합성 데이터를 사용하여 훈련되거나 상황별 휴리스틱의 조정이 필요하며, 이는 본질적으로 이러한 모델의 실세계 적용 능력을 제한합니다. 최근 대규모 자기 지도 학습을 통한 비디오 학습의 발전에도 불구하고, 이러한 표현을 움직임 추정에 활용하는 것은 상대적으로 덜 탐구된 분야입니다. 본 연구에서는 사전 훈련된 다음 프레임 예측 모델로부터 흐름 및 폐색 추정을 위한 자기 지도 학습 기법인 Opt-CWM을 개발합니다. Opt-CWM은 고정된 휴리스틱 없이도 제한 없는 비디오 입력을 통해 훈련하면서 기본 비디오 모델로부터 움직임 정보를 추출하는 반사실적 프로브를 최적화하는 방식으로 작동합니다. 이를 통해 레이블이 없는 데이터만으로도 실세계 비디오에서의 움직임 추정에 있어 최첨단 성능을 달성합니다.

English

Estimating motion in videos is an essential computer vision problem with many downstream applications, including controllable video generation and robotics. Current solutions are primarily trained using synthetic data or require tuning of situation-specific heuristics, which inherently limits these models' capabilities in real-world contexts. Despite recent developments in large-scale self-supervised learning from videos, leveraging such representations for motion estimation remains relatively underexplored. In this work, we develop Opt-CWM, a self-supervised technique for flow and occlusion estimation from a pre-trained next-frame prediction model. Opt-CWM works by learning to optimize counterfactual probes that extract motion information from a base video model, avoiding the need for fixed heuristics while training on unrestricted video inputs. We achieve state-of-the-art performance for motion estimation on real-world videos while requiring no labeled data.

운동 개념의 자기 지도 학습: 반사실적 최적화를 통한 접근

Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals

초록

Support