ChatPaper.aiChatPaper

VABench:音视频生成综合基准测试平台

VABench: A Comprehensive Benchmark for Audio-Video Generation

December 10, 2025
作者: Daili Hua, Xizhi Wang, Bohan Zeng, Xinyi Huang, Hao Liang, Junbo Niu, Xinlong Chen, Quanqing Xu, Wentao Zhang
cs.AI

摘要

近期视频生成技术取得显著进展,使得模型能够生成具有同步音频的视觉吸引力视频。尽管现有视频生成基准测试提供了视觉质量的综合评估指标,但其对音视频生成尤其是同步音视频输出模型的评估仍缺乏说服力。为填补这一空白,我们推出VABench——一个多维度综合基准测试框架,旨在系统评估同步音视频生成能力。该框架涵盖三大任务类型:文本到音视频生成、图像到音视频生成以及立体声音视频生成,并建立两大评估模块共15个维度。这些维度专门评估文本-视频、文本-音频、视频-音频的成对相似性、音视频同步性、唇语一致性,以及精心设计的音视频问答对等指标。此外,VABench覆盖七大内容类别:动物声、人声、音乐、环境声、同步物理声、复杂场景和虚拟世界。我们通过系统化结果分析与可视化,力求为具备同步音频能力的视频生成模型建立新评估标准,推动该领域的全面发展。
English
Recent advances in video generation have been remarkable, enabling models to produce visually compelling videos with synchronized audio. While existing video generation benchmarks provide comprehensive metrics for visual quality, they lack convincing evaluations for audio-video generation, especially for models aiming to generate synchronized audio-video outputs. To address this gap, we introduce VABench, a comprehensive and multi-dimensional benchmark framework designed to systematically evaluate the capabilities of synchronous audio-video generation. VABench encompasses three primary task types: text-to-audio-video (T2AV), image-to-audio-video (I2AV), and stereo audio-video generation. It further establishes two major evaluation modules covering 15 dimensions. These dimensions specifically assess pairwise similarities (text-video, text-audio, video-audio), audio-video synchronization, lip-speech consistency, and carefully curated audio and video question-answering (QA) pairs, among others. Furthermore, VABench covers seven major content categories: animals, human sounds, music, environmental sounds, synchronous physical sounds, complex scenes, and virtual worlds. We provide a systematic analysis and visualization of the evaluation results, aiming to establish a new standard for assessing video generation models with synchronous audio capabilities and to promote the comprehensive advancement of the field.
PDF72December 19, 2025