MovieCORE：电影中的认知推理

摘要

本文介绍了MovieCORE，一个旨在深入探究电影内容认知理解的新型视频问答（VQA）数据集。与现有侧重于表层理解的数据集不同，MovieCORE强调能够激发系统二思维且紧密围绕视频素材的问题。我们提出了一种创新的主动头脑风暴方法，利用多个大型语言模型（LLMs）作为思维代理，生成并优化高质量的问答对。为评估数据集质量，我们开发了一套认知测试，考察问题的深度、启发思考的潜力及句法复杂性。同时，我们提出了一套全面的评估方案，用于衡量VQA模型在深层认知任务上的表现。针对现有视频语言模型（VLMs）的局限，我们引入了一个主动增强模块——主动选择增强（ACE），该模块在训练后提升模型推理能力高达25%。我们的工作推动了AI系统对电影理解的进步，并为当前VQA模型在面对更具挑战性、更细腻的电影内容提问时的能力与局限提供了宝贵见解。项目页面、数据集及代码可访问https://joslefaure.github.io/assets/html/moviecore.html获取。

English

This paper introduces MovieCORE, a novel video question answering (VQA) dataset designed to probe deeper cognitive understanding of movie content. Unlike existing datasets that focus on surface-level comprehension, MovieCORE emphasizes questions that engage System-2 thinking while remaining specific to the video material. We present an innovative agentic brainstorming approach, utilizing multiple large language models (LLMs) as thought agents to generate and refine high-quality question-answer pairs. To evaluate dataset quality, we develop a set of cognitive tests assessing depth, thought-provocation potential, and syntactic complexity. We also propose a comprehensive evaluation scheme for assessing VQA model performance on deeper cognitive tasks. To address the limitations of existing video-language models (VLMs), we introduce an agentic enhancement module, Agentic Choice Enhancement (ACE), which improves model reasoning capabilities post-training by up to 25%. Our work contributes to advancing movie understanding in AI systems and provides valuable insights into the capabilities and limitations of current VQA models when faced with more challenging, nuanced questions about cinematic content. Our project page, dataset and code can be found at https://joslefaure.github.io/assets/html/moviecore.html.

MovieCORE：电影中的认知推理

MovieCORE: COgnitive REasoning in Movies

摘要

Support