IVEBench：現代指令引導影片編輯基準測試套件評估

摘要

指令導向的影片編輯已成為一個快速發展的研究方向，不僅為直觀的內容轉換提供了新機會，也為系統性評估帶來了重大挑戰。現有的影片編輯基準測試無法充分支援指令導向影片編輯的評估，並且存在來源多樣性不足、任務覆蓋範圍狹窄以及評估指標不完整等問題。為解決上述限制，我們推出了IVEBench，這是一個專為指令導向影片編輯評估設計的現代基準測試套件。IVEBench包含一個由600部高品質來源影片組成的多樣化數據庫，涵蓋七個語意維度，影片長度從32幀到1,024幀不等。此外，它還包括8大類編輯任務，共35個子類別，其提示詞由大型語言模型生成並經專家審核完善。關鍵在於，IVEBench建立了一個三維評估協議，涵蓋影片品質、指令遵循度和影片真實性，整合了傳統指標和基於多模態大型語言模型的評估方法。大量實驗證明了IVEBench在基準測試最先進的指令導向影片編輯方法方面的有效性，顯示其能夠提供全面且與人類判斷一致的評估結果。

English

Instruction-guided video editing has emerged as a rapidly advancing research direction, offering new opportunities for intuitive content transformation while also posing significant challenges for systematic evaluation. Existing video editing benchmarks fail to support the evaluation of instruction-guided video editing adequately and further suffer from limited source diversity, narrow task coverage and incomplete evaluation metrics. To address the above limitations, we introduce IVEBench, a modern benchmark suite specifically designed for instruction-guided video editing assessment. IVEBench comprises a diverse database of 600 high-quality source videos, spanning seven semantic dimensions, and covering video lengths ranging from 32 to 1,024 frames. It further includes 8 categories of editing tasks with 35 subcategories, whose prompts are generated and refined through large language models and expert review. Crucially, IVEBench establishes a three-dimensional evaluation protocol encompassing video quality, instruction compliance and video fidelity, integrating both traditional metrics and multimodal large language model-based assessments. Extensive experiments demonstrate the effectiveness of IVEBench in benchmarking state-of-the-art instruction-guided video editing methods, showing its ability to provide comprehensive and human-aligned evaluation outcomes.

IVEBench：現代指令引導影片編輯基準測試套件評估

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment

摘要

Support