使用 DDPM 逆向技術進行零樣本無監督文本音頻編輯

摘要

最近，在圖像領域中，使用大型預訓練模型以零樣本方式編輯信號已經取得了快速進展。然而，這股浪潮尚未觸及音頻領域。本文探討了兩種用於音頻信號的零樣本編輯技術，這些技術利用了預先訓練擴散模型上的DDPM反演。第一種技術源自圖像領域，允許基於文本進行編輯。第二種是一種新穎的方法，用於發現沒有監督的語義有意義的編輯方向。當應用於音樂信號時，這種方法展示了一系列音樂上有趣的修改，從控制特定樂器的參與到對旋律的即興演奏。示例可在我們的示例頁面 https://hilamanor.github.io/AudioEditing/ 找到，代碼可在 https://github.com/hilamanor/AudioEditing/ 找到。

English

Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion on pre-trained diffusion models. The first, adopted from the image domain, allows text-based editing. The second, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples can be found on our examples page in https://hilamanor.github.io/AudioEditing/ and code can be found in https://github.com/hilamanor/AudioEditing/ .

使用 DDPM 逆向技術進行零樣本無監督文本音頻編輯

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

摘要

Support