音樂基礎模型:一項調查
Foundation Models for Music: A Survey
August 26, 2024
作者: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wehhao Huang, Xingjian Du, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang
cs.AI
摘要
近年來,基礎模型(FMs)如大型語言模型(LLMs)和潛在擴散模型(LDMs)深刻影響了包括音樂在內的各個領域。這份全面的評論檢視了音樂領域中最先進的預訓練模型和基礎模型,涵蓋了表示學習、生成學習和多模態學習。我們首先將音樂在各個行業中的重要性置於背景中,並追溯了AI在音樂中的演變。通過描述基礎模型所針對的模態,我們發現許多音樂表示在FM發展中尚未得到充分探索。接著,我們強調了先前方法在多樣音樂應用上缺乏多樣性的問題,以及FMs在音樂理解、生成和醫學應用中的潛力。通過全面探索模型預訓練範式、架構選擇、標記化、微調方法和可控性的細節,我們強調了應該得到深入探討的重要主題,如指導調整和上下文學習、擴展定律和新興能力,以及長序列建模等。一個專門的部分提供了對音樂代理的見解,並伴隨對於預訓練和下游任務至關重要的數據集和評估的深入分析。最後,通過強調道德考量的至關重要性,我們主張,對於音樂的FM研究應更加關注解釋性、透明性、人類責任和版權問題等議題。本文提供了對音樂FMs未來挑戰和趨勢的見解,旨在塑造人工智能與音樂領域的人類合作軌跡。
English
In recent years, foundation models (FMs) such as large language models (LLMs)
and latent diffusion models (LDMs) have profoundly impacted diverse sectors,
including music. This comprehensive review examines state-of-the-art (SOTA)
pre-trained models and foundation models in music, spanning from representation
learning, generative learning and multimodal learning. We first contextualise
the significance of music in various industries and trace the evolution of AI
in music. By delineating the modalities targeted by foundation models, we
discover many of the music representations are underexplored in FM development.
Then, emphasis is placed on the lack of versatility of previous methods on
diverse music applications, along with the potential of FMs in music
understanding, generation and medical application. By comprehensively exploring
the details of the model pre-training paradigm, architectural choices,
tokenisation, finetuning methodologies and controllability, we emphasise the
important topics that should have been well explored, like instruction tuning
and in-context learning, scaling law and emergent ability, as well as
long-sequence modelling etc. A dedicated section presents insights into music
agents, accompanied by a thorough analysis of datasets and evaluations
essential for pre-training and downstream tasks. Finally, by underscoring the
vital importance of ethical considerations, we advocate that following research
on FM for music should focus more on such issues as interpretability,
transparency, human responsibility, and copyright issues. The paper offers
insights into future challenges and trends on FMs for music, aiming to shape
the trajectory of human-AI collaboration in the music realm.Summary
AI-Generated Summary