ChatPaper.aiChatPaper

MAEB:大規模音訊嵌入基準測試

MAEB: Massive Audio Embedding Benchmark

February 17, 2026
作者: Adnan El Assadi, Isaac Chung, Chenghao Xiao, Roman Solomatin, Animesh Jha, Rahul Chand, Silky Singh, Kaitlyn Wang, Ali Sartaz Khan, Marc Moussa Nasser, Sufen Fong, Pengfei He, Alan Xiao, Ayush Sunil Munot, Aditya Shrivastava, Artem Gazizov, Niklas Muennighoff, Kenneth Enevoldsen
cs.AI

摘要

我們推出大規模音頻嵌入基準(MAEB),這是一個涵蓋語音、音樂、環境聲音及跨模態音頻-文本推理的30項任務的大規模基準,涉及100多種語言。我們評估了50多個模型,發現沒有單一模型能在所有任務中佔主導地位:對比式音頻-文本模型在環境聲音分類(如ESC50)表現出色,但在多語言語音任務(如SIB-FLEURS)上得分接近隨機;而語音預訓練模型則呈現相反模式。聚類任務對所有模型仍具挑戰性,即使最佳模型也僅取得中等成果。我們觀察到,擅長聲學理解的模型在語言任務上往往表現不佳,反之亦然。研究還表明,音頻編碼器在MAEB的表現與其在音頻大語言模型中的應用效果高度相關。MAEB源自包含98項任務的MAEB+集合,其設計在保持任務多樣性的同時降低評估成本,並可整合至MTEB生態系統,實現文本、圖像和音頻模態的統一評估。我們在https://github.com/embeddings-benchmark/mteb 開源MAEB及全部98項任務,同時提供代碼和排行榜。
English
We introduce the Massive Audio Embedding Benchmark (MAEB), a large-scale benchmark covering 30 tasks across speech, music, environmental sounds, and cross-modal audio-text reasoning in 100+ languages. We evaluate 50+ models and find that no single model dominates across all tasks: contrastive audio-text models excel at environmental sound classification (e.g., ESC50) but score near random on multilingual speech tasks (e.g., SIB-FLEURS), while speech-pretrained models show the opposite pattern. Clustering remains challenging for all models, with even the best-performing model achieving only modest results. We observe that models excelling on acoustic understanding often perform poorly on linguistic tasks, and vice versa. We also show that the performance of audio encoders on MAEB correlates highly with their performance when used in audio large language models. MAEB is derived from MAEB+, a collection of 98 tasks. MAEB is designed to maintain task diversity while reducing evaluation cost, and it integrates into the MTEB ecosystem for unified evaluation across text, image, and audio modalities. We release MAEB and all 98 tasks along with code and a leaderboard at https://github.com/embeddings-benchmark/mteb.
PDF222March 28, 2026