音乐基础模型:一项调查
Foundation Models for Music: A Survey
August 26, 2024
作者: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wehhao Huang, Xingjian Du, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang
cs.AI
摘要
近年来,基础模型(FMs)如大型语言模型(LLMs)和潜在扩散模型(LDMs)深刻影响了包括音乐在内的多个领域。本综合评估审视了音乐领域最先进的预训练模型和基础模型,涵盖了表示学习、生成学习和多模态学习。我们首先将音乐在各行业中的重要性置于背景中,并追溯了音乐领域人工智能的演变。通过详细描述基础模型所针对的模态,我们发现许多音乐表示在FM发展中尚未得到充分探索。接着,我们强调了先前方法在多样化音乐应用上缺乏灵活性,以及FMs在音乐理解、生成和医疗应用中的潜力。通过全面探讨模型预训练范式、架构选择、标记化、微调方法和可控性的细节,我们强调了应该得到充分探索的重要主题,如指导调整和上下文学习、缩放定律和新兴能力,以及长序列建模等。一个专门的部分提供了对音乐代理的见解,伴随着对数据集和评估的深入分析,这对于预训练和下游任务至关重要。最后,通过强调道德考虑的重要性,我们主张未来针对音乐的FM研究应更多关注诸如可解释性、透明性、人类责任和版权问题等议题。本文提供了关于音乐FMs未来挑战和趋势的见解,旨在塑造人工智能与音乐领域人类合作的发展轨迹。
English
In recent years, foundation models (FMs) such as large language models (LLMs)
and latent diffusion models (LDMs) have profoundly impacted diverse sectors,
including music. This comprehensive review examines state-of-the-art (SOTA)
pre-trained models and foundation models in music, spanning from representation
learning, generative learning and multimodal learning. We first contextualise
the significance of music in various industries and trace the evolution of AI
in music. By delineating the modalities targeted by foundation models, we
discover many of the music representations are underexplored in FM development.
Then, emphasis is placed on the lack of versatility of previous methods on
diverse music applications, along with the potential of FMs in music
understanding, generation and medical application. By comprehensively exploring
the details of the model pre-training paradigm, architectural choices,
tokenisation, finetuning methodologies and controllability, we emphasise the
important topics that should have been well explored, like instruction tuning
and in-context learning, scaling law and emergent ability, as well as
long-sequence modelling etc. A dedicated section presents insights into music
agents, accompanied by a thorough analysis of datasets and evaluations
essential for pre-training and downstream tasks. Finally, by underscoring the
vital importance of ethical considerations, we advocate that following research
on FM for music should focus more on such issues as interpretability,
transparency, human responsibility, and copyright issues. The paper offers
insights into future challenges and trends on FMs for music, aiming to shape
the trajectory of human-AI collaboration in the music realm.Summary
AI-Generated Summary