간단하고 제어 가능한 음악 생성

초록

조건부 음악 생성 작업을 다루기 위해, 우리는 MusicGen을 소개합니다. MusicGen은 여러 개의 압축된 이산 음악 표현(즉, 토큰) 스트림에서 동작하는 단일 언어 모델(LM)입니다. 기존 연구와 달리, MusicGen은 단일 단계의 트랜스포머 LM과 효율적인 토큰 인터리빙 패턴으로 구성되어 있어, 계층적 또는 업샘플링과 같은 여러 모델을 연속적으로 사용할 필요가 없습니다. 이 접근 방식을 통해, MusicGen이 텍스트 설명이나 멜로디 특징에 조건화되어 고품질 샘플을 생성하면서도 생성된 출력에 대한 더 나은 제어를 가능하게 하는 방법을 보여줍니다. 우리는 자동 평가와 인간 평가를 모두 고려한 광범위한 실험적 평가를 수행하여, 제안된 접근 방식이 표준 텍스트-음악 벤치마크에서 평가된 기준선보다 우수함을 보여줍니다. 또한, ablation 연구를 통해 MusicGen을 구성하는 각 구성 요소의 중요성을 밝혔습니다. 음악 샘플, 코드 및 모델은 https://github.com/facebookresearch/audiocraft에서 확인할 수 있습니다.

English

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the importance of each of the components comprising MusicGen. Music samples, code, and models are available at https://github.com/facebookresearch/audiocraft.

간단하고 제어 가능한 음악 생성

Simple and Controllable Music Generation

초록

Support