MovieCORE:電影中的認知推理
MovieCORE: COgnitive REasoning in Movies
August 26, 2025
作者: Gueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Ying Cheng, Hung-Ting Su, Yung-Hao Tang, Shang-Hong Lai, Winston H. Hsu
cs.AI
摘要
本文介紹了MovieCORE,一個新穎的視頻問答(VQA)數據集,旨在深入探討電影內容的認知理解。與現有數據集專注於表面層次的理解不同,MovieCORE強調那些激發系統二思維的問題,同時保持對視頻材料的針對性。我們提出了一種創新的主動性頭腦風暴方法,利用多個大型語言模型(LLMs)作為思維代理,生成並精煉高質量的問答對。為了評估數據集質量,我們開發了一套認知測試,評估深度、啟發潛力和句法複雜性。我們還提出了一個全面的評估方案,用於評估VQA模型在更深層次認知任務上的表現。針對現有視頻語言模型(VLMs)的局限性,我們引入了一個主動性增強模塊——主動選擇增強(ACE),該模塊在訓練後將模型推理能力提升高達25%。我們的工作有助於推進AI系統對電影的理解,並為當前VQA模型在面對更具挑戰性、更細膩的電影內容問題時的能力和局限性提供了寶貴的見解。我們的項目頁面、數據集和代碼可在https://joslefaure.github.io/assets/html/moviecore.html找到。
English
This paper introduces MovieCORE, a novel video question answering (VQA)
dataset designed to probe deeper cognitive understanding of movie content.
Unlike existing datasets that focus on surface-level comprehension, MovieCORE
emphasizes questions that engage System-2 thinking while remaining specific to
the video material. We present an innovative agentic brainstorming approach,
utilizing multiple large language models (LLMs) as thought agents to generate
and refine high-quality question-answer pairs. To evaluate dataset quality, we
develop a set of cognitive tests assessing depth, thought-provocation
potential, and syntactic complexity. We also propose a comprehensive evaluation
scheme for assessing VQA model performance on deeper cognitive tasks. To
address the limitations of existing video-language models (VLMs), we introduce
an agentic enhancement module, Agentic Choice Enhancement (ACE), which improves
model reasoning capabilities post-training by up to 25%. Our work contributes
to advancing movie understanding in AI systems and provides valuable insights
into the capabilities and limitations of current VQA models when faced with
more challenging, nuanced questions about cinematic content. Our project page,
dataset and code can be found at
https://joslefaure.github.io/assets/html/moviecore.html.