不可能影片
Impossible Videos
March 18, 2025
作者: Zechen Bai, Hai Ci, Mike Zheng Shou
cs.AI
摘要
現今,合成影片被廣泛用於彌補現實世界影片在數據稀缺性和多樣性方面的不足。當前的合成數據集主要複製現實世界場景,而對不可能、反事實及違反現實的影片概念探索不足。本研究旨在回答兩個問題:1) 當今的影片生成模型能否有效遵循提示,創造出不可能的影片內容?2) 當今的影片理解模型是否足以理解不可能的影片?為此,我們引入了IPV-Bench,這是一個新穎的基準,旨在評估並促進影片理解與生成領域的進步。IPV-Bench基於一個全面的分類體系,涵蓋4個領域、14個類別,並展示了違反物理、生物、地理或社會法則的多樣場景。基於此分類體系,我們構建了一套提示集,用於評估影片生成模型在遵循提示和創造力方面的能力。此外,我們還策劃了一個影片基準,專門評估Video-LLMs在理解不可能影片方面的能力,這尤其需要對時間動態和世界知識進行推理。全面的評估揭示了影片模型的局限性,並為未來的研究方向提供了洞見,為下一代影片模型的發展鋪平了道路。
English
Synthetic videos nowadays is widely used to complement data scarcity and
diversity of real-world videos. Current synthetic datasets primarily replicate
real-world scenarios, leaving impossible, counterfactual and anti-reality video
concepts underexplored. This work aims to answer two questions: 1) Can today's
video generation models effectively follow prompts to create impossible video
content? 2) Are today's video understanding models good enough for
understanding impossible videos? To this end, we introduce IPV-Bench, a novel
benchmark designed to evaluate and foster progress in video understanding and
generation. IPV-Bench is underpinned by a comprehensive taxonomy, encompassing
4 domains, 14 categories. It features diverse scenes that defy physical,
biological, geographical, or social laws. Based on the taxonomy, a prompt suite
is constructed to evaluate video generation models, challenging their prompt
following and creativity capabilities. In addition, a video benchmark is
curated to assess Video-LLMs on their ability of understanding impossible
videos, which particularly requires reasoning on temporal dynamics and world
knowledge. Comprehensive evaluations reveal limitations and insights for future
directions of video models, paving the way for next-generation video models.Summary
AI-Generated Summary