不可能视频
Impossible Videos
March 18, 2025
作者: Zechen Bai, Hai Ci, Mike Zheng Shou
cs.AI
摘要
如今,合成视频被广泛用于弥补现实世界视频数据稀缺性和多样性的不足。当前的合成数据集主要复制现实场景,而对不可能、反事实及反现实的视频概念探索不足。本研究旨在回答两个问题:1) 当今的视频生成模型能否有效遵循提示,创造出不可能的视频内容?2) 当今的视频理解模型是否足以理解不可能的视频?为此,我们引入了IPV-Bench,一个旨在评估并推动视频理解与生成进步的新颖基准。IPV-Bench以一套全面的分类体系为基础,涵盖4大领域、14个类别,展示了违背物理、生物、地理或社会法则的多样化场景。基于此分类体系,构建了一套提示集来评估视频生成模型,挑战其遵循提示与创造力的能力。此外,还精心策划了一个视频基准,用于评估视频大语言模型(Video-LLMs)在理解不可能视频方面的能力,这尤其需要模型具备对时间动态和世界知识的推理能力。全面的评估揭示了视频模型的局限性与未来发展方向,为下一代视频模型铺平了道路。
English
Synthetic videos nowadays is widely used to complement data scarcity and
diversity of real-world videos. Current synthetic datasets primarily replicate
real-world scenarios, leaving impossible, counterfactual and anti-reality video
concepts underexplored. This work aims to answer two questions: 1) Can today's
video generation models effectively follow prompts to create impossible video
content? 2) Are today's video understanding models good enough for
understanding impossible videos? To this end, we introduce IPV-Bench, a novel
benchmark designed to evaluate and foster progress in video understanding and
generation. IPV-Bench is underpinned by a comprehensive taxonomy, encompassing
4 domains, 14 categories. It features diverse scenes that defy physical,
biological, geographical, or social laws. Based on the taxonomy, a prompt suite
is constructed to evaluate video generation models, challenging their prompt
following and creativity capabilities. In addition, a video benchmark is
curated to assess Video-LLMs on their ability of understanding impossible
videos, which particularly requires reasoning on temporal dynamics and world
knowledge. Comprehensive evaluations reveal limitations and insights for future
directions of video models, paving the way for next-generation video models.Summary
AI-Generated Summary