ChatPaper.aiChatPaper

JEN-1:具備全方位擴散模型的文本引導通用音樂生成

JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

August 9, 2023
作者: Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang
cs.AI

摘要

隨著深度生成模型的進步,音樂生成引起了越來越多的興趣。然而,在文字描述條件下生成音樂,即所謂的文本到音樂,由於音樂結構的複雜性和高採樣率要求,仍然具有挑戰性。儘管這項任務的重要性,但目前的生成模型在音樂質量、計算效率和泛化能力方面存在限制。本文介紹了JEN-1,一個用於文本到音樂生成的通用高保真模型。JEN-1是一個融合自回歸和非自回歸訓練的擴散模型。通過上下文學習,JEN-1執行各種生成任務,包括文本引導的音樂生成、音樂修補和延續。評估表明,JEN-1在文本音樂對齊和音樂質量方面表現優越,同時保持計算效率。我們的演示可在http://futureverse.com/research/jen/demos/jen1找到。
English
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at http://futureverse.com/research/jen/demos/jen1
PDF326December 15, 2024