ChatPaper.aiChatPaper

零样本无监督文本驱动音频编辑使用DDPM反演

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

February 15, 2024
作者: Hila Manor, Tomer Michaeli
cs.AI

摘要

最近,在图像领域,使用大型预训练模型以零-shot方式编辑信号已经取得了快速进展。然而,这股浪潮尚未触及音频领域。在本文中,我们探讨了两种用于音频信号的零-shot编辑技术,这些技术利用了预训练扩散模型上的DDPM反演。第一种技术源自图像领域,允许基于文本进行编辑。第二种是一种新颖的方法,可以在无监督的情况下发现语义上有意义的编辑方向。当应用于音乐信号时,这种方法展示了一系列具有音乐趣味性的修改,从控制特定乐器的参与到对旋律的即兴演奏。示例可在我们的示例页面https://hilamanor.github.io/AudioEditing/ 找到,代码可在https://github.com/hilamanor/AudioEditing/ 找到。
English
Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion on pre-trained diffusion models. The first, adopted from the image domain, allows text-based editing. The second, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples can be found on our examples page in https://hilamanor.github.io/AudioEditing/ and code can be found in https://github.com/hilamanor/AudioEditing/ .
PDF232December 15, 2024