ChatPaper.aiChatPaper

IVEBench:現代指令引導影片編輯基準測試套件評估

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

October 13, 2025
作者: Yinan Chen, Jiangning Zhang, Teng Hu, Yuxiang Zeng, Zhucun Xue, Qingdong He, Chengjie Wang, Yong Liu, Xiaobin Hu, Shuicheng Yan
cs.AI

摘要

指令導向的影片編輯已成為一個快速發展的研究方向,不僅為直觀的內容轉換提供了新機會,也為系統性評估帶來了重大挑戰。現有的影片編輯基準測試無法充分支援指令導向影片編輯的評估,並且存在來源多樣性不足、任務覆蓋範圍狹窄以及評估指標不完整等問題。為解決上述限制,我們推出了IVEBench,這是一個專為指令導向影片編輯評估設計的現代基準測試套件。IVEBench包含一個由600部高品質來源影片組成的多樣化數據庫,涵蓋七個語意維度,影片長度從32幀到1,024幀不等。此外,它還包括8大類編輯任務,共35個子類別,其提示詞由大型語言模型生成並經專家審核完善。關鍵在於,IVEBench建立了一個三維評估協議,涵蓋影片品質、指令遵循度和影片真實性,整合了傳統指標和基於多模態大型語言模型的評估方法。大量實驗證明了IVEBench在基準測試最先進的指令導向影片編輯方法方面的有效性,顯示其能夠提供全面且與人類判斷一致的評估結果。
English
Instruction-guided video editing has emerged as a rapidly advancing research direction, offering new opportunities for intuitive content transformation while also posing significant challenges for systematic evaluation. Existing video editing benchmarks fail to support the evaluation of instruction-guided video editing adequately and further suffer from limited source diversity, narrow task coverage and incomplete evaluation metrics. To address the above limitations, we introduce IVEBench, a modern benchmark suite specifically designed for instruction-guided video editing assessment. IVEBench comprises a diverse database of 600 high-quality source videos, spanning seven semantic dimensions, and covering video lengths ranging from 32 to 1,024 frames. It further includes 8 categories of editing tasks with 35 subcategories, whose prompts are generated and refined through large language models and expert review. Crucially, IVEBench establishes a three-dimensional evaluation protocol encompassing video quality, instruction compliance and video fidelity, integrating both traditional metrics and multimodal large language model-based assessments. Extensive experiments demonstrate the effectiveness of IVEBench in benchmarking state-of-the-art instruction-guided video editing methods, showing its ability to provide comprehensive and human-aligned evaluation outcomes.
PDF22October 14, 2025