APEX: AI生成音楽の大規模マルチタスク美学情報統合的人気度予測

要旨

音楽の人気予測は、アーティスト、プラットフォーム、推薦システムに関連する重要な研究分野として注目を集めている。しかし、AI生成音楽プラットフォームの爆発的台頭は、アーティストの評判やレーベルの支援といった従来の指標が存在せず、日々大量の楽曲が生成・消費される、全く新たで未開拓の領域を生み出した。この分野で特に未解明の核心的要因が美的品質である。本研究では、SunoとUdioから収集した21万1千曲（音響データ10千時間）を用いて、自己教師あり音楽理解モデルMERTから抽出された凍結音響埋め込みから、エンゲージメントベースの人気指標（ストリーム数と「いいね」スコア）と5つの知覚的美的品質次元を同時に予測する、AI生成音楽向け初の大規模マルチタスク学習フレームームAPEXを提案する。美的品質と人気度は音楽の相補的側面を捉えており、その併用が有効であることを示す：訓練時に未経験の11の生成音楽システムを含むMusic Arenaデータセットを用いた分布外評価において、美的特徴を組み込むことで選好予測精度が一貫して向上し、学習された表現が生成アーキテクチャを跨いで強力な一般化性能を発揮することを実証した。

English

Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit is aesthetic quality. We propose APEX, the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptual aesthetic quality dimensions from frozen audio embeddings extracted from MERT, a self-supervised music understanding model. Aesthetic quality and popularity capture complementary aspects of music that together prove valuable: in an out-of-distribution evaluation on the Music Arena dataset, comprising pairwise human preference battles across eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.

APEX: AI生成音楽の大規模マルチタスク美学情報統合的人気度予測

APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music

要旨

Support