邁向大型音頻-語言模型的整體評估:一項全面調查
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
May 21, 2025
作者: Chih-Kai Yang, Neo S. Ho, Hung-yi Lee
cs.AI
摘要
隨著大型音頻語言模型(LALMs)的進步,這些模型通過增強大型語言模型(LLMs)的聽覺能力,預計將在多種聽覺任務中展現出通用性。儘管已出現眾多基準來評估LALMs的性能,但它們仍顯得零散且缺乏系統化的分類。為彌補這一差距,我們進行了一項全面調查,並提出了一個系統化的LALM評估分類法,根據其目標將其分為四個維度:(1) 通用聽覺感知與處理,(2) 知識與推理,(3) 對話導向能力,以及(4) 公平性、安全性與可信度。我們在每個類別中提供了詳細的概述,並強調了該領域的挑戰,為未來的研究方向提供了洞見。據我們所知,這是首個專門聚焦於LALM評估的調查,為學術界提供了清晰的指導。我們將發布所調查論文的集合,並積極維護以支持該領域的持續發展。
English
With advancements in large audio-language models (LALMs), which enhance large
language models (LLMs) with auditory capabilities, these models are expected to
demonstrate universal proficiency across various auditory tasks. While numerous
benchmarks have emerged to assess LALMs' performance, they remain fragmented
and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive
survey and propose a systematic taxonomy for LALM evaluations, categorizing
them into four dimensions based on their objectives: (1) General Auditory
Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented
Ability, and (4) Fairness, Safety, and Trustworthiness. We provide detailed
overviews within each category and highlight challenges in this field, offering
insights into promising future directions. To the best of our knowledge, this
is the first survey specifically focused on the evaluations of LALMs, providing
clear guidelines for the community. We will release the collection of the
surveyed papers and actively maintain it to support ongoing advancements in the
field.Summary
AI-Generated Summary