迈向大规模音频-语言模型的整体评估:一项全面综述
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
May 21, 2025
作者: Chih-Kai Yang, Neo S. Ho, Hung-yi Lee
cs.AI
摘要
随着大型音频语言模型(LALMs)的进步,这些模型通过增强大型语言模型(LLMs)的听觉能力,预计将在多种听觉任务中展现出通用性。尽管已有众多基准测试用于评估LALMs的性能,但它们仍显分散且缺乏系统化的分类体系。为填补这一空白,我们进行了全面调研,并提出了一个针对LALM评估的系统化分类框架,依据评估目标将其划分为四个维度:(1) 通用听觉感知与处理,(2) 知识与推理,(3) 对话导向能力,以及(4) 公平性、安全性与可信度。我们在每一类别下提供了详尽的综述,并指出了该领域面临的挑战,为未来研究方向提供了洞见。据我们所知,这是首次专门聚焦于LALMs评估的调研,为学术界提供了明确的指导。我们将发布所调研论文的集合,并积极维护以支持该领域的持续发展。
English
With advancements in large audio-language models (LALMs), which enhance large
language models (LLMs) with auditory capabilities, these models are expected to
demonstrate universal proficiency across various auditory tasks. While numerous
benchmarks have emerged to assess LALMs' performance, they remain fragmented
and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive
survey and propose a systematic taxonomy for LALM evaluations, categorizing
them into four dimensions based on their objectives: (1) General Auditory
Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented
Ability, and (4) Fairness, Safety, and Trustworthiness. We provide detailed
overviews within each category and highlight challenges in this field, offering
insights into promising future directions. To the best of our knowledge, this
is the first survey specifically focused on the evaluations of LALMs, providing
clear guidelines for the community. We will release the collection of the
surveyed papers and actively maintain it to support ongoing advancements in the
field.Summary
AI-Generated Summary