MusicAgent: 大規模言語モデルを用いた音楽理解と生成のためのAIエージェント

要旨

AIを活用した音楽処理は多様な分野であり、生成タスク（例：音色合成）から理解タスク（例：音楽分類）まで数十のタスクを包含しています。開発者やアマチュアにとって、音楽処理における要件を満たすためにこれら全てのタスクを把握することは非常に困難です。特に、音楽データの表現方法やプラットフォーム間でのモデルの適用性に大きな違いがあることを考慮すると、その難しさは一層増します。そのため、これらのタスクを整理・統合し、実践者が自身のニーズを自動的に分析し、適切なツールを呼び出して要件を満たすのを支援するシステムを構築することが必要です。大規模言語モデル（LLM）のタスク自動化における最近の成功に触発され、我々はMusicAgentというシステムを開発しました。このシステムは、多数の音楽関連ツールと自律的なワークフローを統合し、ユーザーの要件に対応します。具体的には、1) Hugging Face、GitHub、Web APIなど多様なソースからツールを収集するツールセット、2) LLM（例：ChatGPT）によって強化された自律的なワークフローを構築し、これらのツールを整理し、ユーザーのリクエストを複数のサブタスクに分解し、対応する音楽ツールを自動的に呼び出します。このシステムの主な目的は、ユーザーがAI音楽ツールの複雑さから解放され、創造的な側面に集中できるようにすることです。ユーザーがツールを簡単に組み合わせる自由を与えることで、シームレスで豊かな音楽体験を提供します。

English

AI-empowered music processing is a diverse field that encompasses dozens of tasks, ranging from generation tasks (e.g., timbre synthesis) to comprehension tasks (e.g., music classification). For developers and amateurs, it is very difficult to grasp all of these task to satisfy their requirements in music processing, especially considering the huge differences in the representations of music data and the model applicability across platforms among various tasks. Consequently, it is necessary to build a system to organize and integrate these tasks, and thus help practitioners to automatically analyze their demand and call suitable tools as solutions to fulfill their requirements. Inspired by the recent success of large language models (LLMs) in task automation, we develop a system, named MusicAgent, which integrates numerous music-related tools and an autonomous workflow to address user requirements. More specifically, we build 1) toolset that collects tools from diverse sources, including Hugging Face, GitHub, and Web API, etc. 2) an autonomous workflow empowered by LLMs (e.g., ChatGPT) to organize these tools and automatically decompose user requests into multiple sub-tasks and invoke corresponding music tools. The primary goal of this system is to free users from the intricacies of AI-music tools, enabling them to concentrate on the creative aspect. By granting users the freedom to effortlessly combine tools, the system offers a seamless and enriching music experience.

MusicAgent: 大規模言語モデルを用いた音楽理解と生成のためのAIエージェント

MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

要旨

Support