IVEBench:現代指令引導影片編輯基準測試套件評估
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
October 13, 2025
作者: Yinan Chen, Jiangning Zhang, Teng Hu, Yuxiang Zeng, Zhucun Xue, Qingdong He, Chengjie Wang, Yong Liu, Xiaobin Hu, Shuicheng Yan
cs.AI
摘要
指令導向的影片編輯已成為一個快速發展的研究方向,不僅為直觀的內容轉換提供了新機會,也為系統性評估帶來了重大挑戰。現有的影片編輯基準測試無法充分支援指令導向影片編輯的評估,並且存在來源多樣性不足、任務覆蓋範圍狹窄以及評估指標不完整等問題。為解決上述限制,我們推出了IVEBench,這是一個專為指令導向影片編輯評估設計的現代基準測試套件。IVEBench包含一個由600部高品質來源影片組成的多樣化數據庫,涵蓋七個語意維度,影片長度從32幀到1,024幀不等。此外,它還包括8大類編輯任務,共35個子類別,其提示詞由大型語言模型生成並經專家審核完善。關鍵在於,IVEBench建立了一個三維評估協議,涵蓋影片品質、指令遵循度和影片真實性,整合了傳統指標和基於多模態大型語言模型的評估方法。大量實驗證明了IVEBench在基準測試最先進的指令導向影片編輯方法方面的有效性,顯示其能夠提供全面且與人類判斷一致的評估結果。
English
Instruction-guided video editing has emerged as a rapidly advancing research
direction, offering new opportunities for intuitive content transformation
while also posing significant challenges for systematic evaluation. Existing
video editing benchmarks fail to support the evaluation of instruction-guided
video editing adequately and further suffer from limited source diversity,
narrow task coverage and incomplete evaluation metrics. To address the above
limitations, we introduce IVEBench, a modern benchmark suite specifically
designed for instruction-guided video editing assessment. IVEBench comprises a
diverse database of 600 high-quality source videos, spanning seven semantic
dimensions, and covering video lengths ranging from 32 to 1,024 frames. It
further includes 8 categories of editing tasks with 35 subcategories, whose
prompts are generated and refined through large language models and expert
review. Crucially, IVEBench establishes a three-dimensional evaluation protocol
encompassing video quality, instruction compliance and video fidelity,
integrating both traditional metrics and multimodal large language model-based
assessments. Extensive experiments demonstrate the effectiveness of IVEBench in
benchmarking state-of-the-art instruction-guided video editing methods, showing
its ability to provide comprehensive and human-aligned evaluation outcomes.