見快亦見慢:學習影片中的時間流動
Seeing Fast and Slow: Learning the Flow of Time in Videos
April 23, 2026
作者: Yen-Siang Wu, Rundong Luo, Jingsen Zhu, Tao Tu, Ali Farhadi, Matthew Wallingford, Yu-Chiang Frank Wang, Steve Marschner, Wei-Chiu Ma
cs.AI
摘要
我們如何判斷影片是否經過加速或減速處理?又如何生成不同播放速度的影片?儘管影片已成為現代電腦視覺研究的核心載體,學界對時間流逝的感知與控制卻鮮少關注。本文將時間視為可學習的視覺概念,開發能推理與操控影片時間流的模型。我們首先利用影片中天然存在的多模態線索與時序結構,以自監督方式學習偵測速度變化並估算播放速率。接著證明這些習得的時序推理模型能協助我們從充滿雜訊的實境影片源中,策展建構迄今規模最大的慢動作影片資料集。這類通常由高速攝影機拍攝的慢動作影像,蘊含的時序細節遠超標準影片。基於此數據,我們進一步開發具時序控制能力的模型,包括能按指定速度生成動態畫面的「速度條件影片生成」,以及將低幀率模糊影片轉化為具精細時序細節之高幀率序列的「時序超解析度技術」。我們的研究成果凸顯時間作為影片學習中可操控的感知維度,為時序可控影片生成、時序鑑識檢測開闢新徑,更有望建構能理解事件時序演變的豐富世界模型。
English
How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual concept and develop models for reasoning about and manipulating the flow of time in videos. We first exploit the multimodal cues and temporal structure naturally present in videos to learn, in a self-supervised manner, to detect speed changes and estimate playback speed. We then show that these learned temporal reasoning models enable us to curate the largest slow-motion video dataset to date from noisy in-the-wild sources. Such slow-motion footage, typically filmed by high-speed cameras, contains substantially richer temporal detail than standard videos. Using this data, we further develop models capable of temporal control, including speed-conditioned video generation, which produces motion at specified playback speed, and temporal super-resolution, which tranforms low-FPS, blurry videos into high-FPS sequences with fine-grained temporal details. Our findings highlight time as a manipulable, perceptual dimension in video learning, opening doors to temporally controllable video generation, temporal forensics detection, and potentially richer world-models that understand how events unfold over time.