APEX:面向AI生成音樂的大規模多任務美學感知流行度預測框架
APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music
May 5, 2026
作者: Jaavid Aktar Husain, Dorien Herremans
cs.AI
摘要
音樂流行度預測因其與藝術家、平台及推薦系統的關聯性而日益受到研究關注。然而,AI生成音樂平台的爆炸性崛起開創了一個全新且尚未被充分探索的領域——每日海量產出和消費的歌曲不再依賴傳統的藝術家聲譽或廠牌背書標記。在此過程中,美學質量這一關鍵因素尚未得到探討。我們提出首個面向AI生成音樂的大規模多任務學習框架APEX,該框架基於從Suno和Udio平台獲取的21.1萬首歌曲(時長1萬小時)進行訓練,通過自監督音樂理解模型MERT提取的凍結音頻嵌入表徵,聯合預測基於參與度的流行度信號(播放量與點讚評分)以及五個感知維度的美學質量。美學質量與流行度捕捉了音樂的互補特徵:在Music Arena數據集的分布外評估中(包含訓練階段未見過的11種生成式音樂系統的兩兩人類偏好對決),引入美學特徵能持續提升偏好預測準確率,證明了所學表徵在不同生成架構間具有強泛化能力。
English
Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit is aesthetic quality. We propose APEX, the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptual aesthetic quality dimensions from frozen audio embeddings extracted from MERT, a self-supervised music understanding model. Aesthetic quality and popularity capture complementary aspects of music that together prove valuable: in an out-of-distribution evaluation on the Music Arena dataset, comprising pairwise human preference battles across eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.