MusicAgent:一个基于大型语言模型的音乐理解和生成的人工智能代理程序
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
October 18, 2023
作者: Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian
cs.AI
摘要
AI赋能音乐处理是一个多样化的领域,涵盖了数十种任务,从生成任务(例如音色合成)到理解任务(例如音乐分类)。对于开发人员和业余爱好者来说,要掌握所有这些任务以满足他们在音乐处理方面的需求是非常困难的,尤其是考虑到音乐数据的表示方式以及不同任务在各平台上模型适用性之间的巨大差异。因此,有必要构建一个系统来组织和整合这些任务,从而帮助从业者自动分析他们的需求,并调用合适的工具作为解决方案来满足他们的需求。受到大型语言模型(LLMs)在任务自动化方面的最近成功的启发,我们开发了一个名为MusicAgent的系统,该系统整合了众多与音乐相关的工具和一个自主工作流程来满足用户需求。更具体地说,我们构建了1)从不同来源(包括Hugging Face、GitHub和Web API等)收集工具的工具集;2)由LLMs(例如ChatGPT)赋能的自主工作流程,用于组织这些工具,并自动将用户请求分解为多个子任务,并调用相应的音乐工具。该系统的主要目标是使用户摆脱AI音乐工具的复杂性,让他们专注于创造性方面。通过赋予用户轻松组合工具的自由,该系统提供了一个无缝且丰富的音乐体验。
English
AI-empowered music processing is a diverse field that encompasses dozens of
tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension
tasks (e.g., music classification). For developers and amateurs, it is very
difficult to grasp all of these task to satisfy their requirements in music
processing, especially considering the huge differences in the representations
of music data and the model applicability across platforms among various tasks.
Consequently, it is necessary to build a system to organize and integrate these
tasks, and thus help practitioners to automatically analyze their demand and
call suitable tools as solutions to fulfill their requirements. Inspired by the
recent success of large language models (LLMs) in task automation, we develop a
system, named MusicAgent, which integrates numerous music-related tools and an
autonomous workflow to address user requirements. More specifically, we build
1) toolset that collects tools from diverse sources, including Hugging Face,
GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g.,
ChatGPT) to organize these tools and automatically decompose user requests into
multiple sub-tasks and invoke corresponding music tools. The primary goal of
this system is to free users from the intricacies of AI-music tools, enabling
them to concentrate on the creative aspect. By granting users the freedom to
effortlessly combine tools, the system offers a seamless and enriching music
experience.