ChatPaper.aiChatPaper

IVEBench:现代指令引导视频编辑基准测试套件评估

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

October 13, 2025
作者: Yinan Chen, Jiangning Zhang, Teng Hu, Yuxiang Zeng, Zhucun Xue, Qingdong He, Chengjie Wang, Yong Liu, Xiaobin Hu, Shuicheng Yan
cs.AI

摘要

指令引导的视频编辑已成为一个快速发展的研究方向,它不仅为直观的内容转换提供了新机遇,同时也对系统性评估提出了重大挑战。现有的视频编辑基准测试无法充分支持指令引导视频编辑的评估,且存在源数据多样性不足、任务覆盖范围狭窄以及评估指标不完整等问题。为解决上述局限,我们推出了IVEBench,这是一个专为指令引导视频编辑评估设计的现代基准测试套件。IVEBench包含一个由600个高质量源视频组成的多样化数据库,涵盖七个语义维度,视频长度从32帧到1,024帧不等。此外,它还包含了8大类编辑任务,细分为35个子类别,其提示词通过大型语言模型生成并经过专家评审优化。尤为关键的是,IVEBench建立了一个三维评估协议,涵盖视频质量、指令遵循度和视频保真度,整合了传统指标与基于多模态大型语言模型的评估方法。大量实验验证了IVEBench在基准测试最新指令引导视频编辑方法中的有效性,展示了其提供全面且与人类评价一致评估结果的能力。
English
Instruction-guided video editing has emerged as a rapidly advancing research direction, offering new opportunities for intuitive content transformation while also posing significant challenges for systematic evaluation. Existing video editing benchmarks fail to support the evaluation of instruction-guided video editing adequately and further suffer from limited source diversity, narrow task coverage and incomplete evaluation metrics. To address the above limitations, we introduce IVEBench, a modern benchmark suite specifically designed for instruction-guided video editing assessment. IVEBench comprises a diverse database of 600 high-quality source videos, spanning seven semantic dimensions, and covering video lengths ranging from 32 to 1,024 frames. It further includes 8 categories of editing tasks with 35 subcategories, whose prompts are generated and refined through large language models and expert review. Crucially, IVEBench establishes a three-dimensional evaluation protocol encompassing video quality, instruction compliance and video fidelity, integrating both traditional metrics and multimodal large language model-based assessments. Extensive experiments demonstrate the effectiveness of IVEBench in benchmarking state-of-the-art instruction-guided video editing methods, showing its ability to provide comprehensive and human-aligned evaluation outcomes.
PDF22October 14, 2025