MovieCORE:电影中的认知推理
MovieCORE: COgnitive REasoning in Movies
August 26, 2025
作者: Gueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Ying Cheng, Hung-Ting Su, Yung-Hao Tang, Shang-Hong Lai, Winston H. Hsu
cs.AI
摘要
本文介绍了MovieCORE,一个旨在深入探究电影内容认知理解的新型视频问答(VQA)数据集。与现有侧重于表层理解的数据集不同,MovieCORE强调能够激发系统二思维且紧密围绕视频素材的问题。我们提出了一种创新的主动头脑风暴方法,利用多个大型语言模型(LLMs)作为思维代理,生成并优化高质量的问答对。为评估数据集质量,我们开发了一套认知测试,考察问题的深度、启发思考的潜力及句法复杂性。同时,我们提出了一套全面的评估方案,用于衡量VQA模型在深层认知任务上的表现。针对现有视频语言模型(VLMs)的局限,我们引入了一个主动增强模块——主动选择增强(ACE),该模块在训练后提升模型推理能力高达25%。我们的工作推动了AI系统对电影理解的进步,并为当前VQA模型在面对更具挑战性、更细腻的电影内容提问时的能力与局限提供了宝贵见解。项目页面、数据集及代码可访问https://joslefaure.github.io/assets/html/moviecore.html获取。
English
This paper introduces MovieCORE, a novel video question answering (VQA)
dataset designed to probe deeper cognitive understanding of movie content.
Unlike existing datasets that focus on surface-level comprehension, MovieCORE
emphasizes questions that engage System-2 thinking while remaining specific to
the video material. We present an innovative agentic brainstorming approach,
utilizing multiple large language models (LLMs) as thought agents to generate
and refine high-quality question-answer pairs. To evaluate dataset quality, we
develop a set of cognitive tests assessing depth, thought-provocation
potential, and syntactic complexity. We also propose a comprehensive evaluation
scheme for assessing VQA model performance on deeper cognitive tasks. To
address the limitations of existing video-language models (VLMs), we introduce
an agentic enhancement module, Agentic Choice Enhancement (ACE), which improves
model reasoning capabilities post-training by up to 25%. Our work contributes
to advancing movie understanding in AI systems and provides valuable insights
into the capabilities and limitations of current VQA models when faced with
more challenging, nuanced questions about cinematic content. Our project page,
dataset and code can be found at
https://joslefaure.github.io/assets/html/moviecore.html.