ChatPaper.aiChatPaper

JEN-1:文本引导的全方位音乐生成与全向扩散模型

JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

August 9, 2023
作者: Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang
cs.AI

摘要

随着深度生成模型的发展,音乐生成引起了越来越多的关注。然而,在文本描述条件下生成音乐,即文本到音乐,由于音乐结构的复杂性和高采样率要求,仍然具有挑战性。尽管这一任务的重要性,但目前的生成模型在音乐质量、计算效率和泛化能力方面存在局限性。本文介绍了JEN-1,这是一个用于文本到音乐生成的通用高保真模型。JEN-1是一个融合自回归和非自回归训练的扩散模型。通过上下文学习,JEN-1执行各种生成任务,包括文本引导的音乐生成、音乐修补和延续。评估表明,JEN-1在文本音乐对齐和音乐质量方面表现优越,同时保持计算效率。我们的演示可在http://futureverse.com/research/jen/demos/jen1 上找到。
English
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at http://futureverse.com/research/jen/demos/jen1
PDF326December 15, 2024