AudioSR:大规模多功能音频超分辨率

AudioSR: Versatile Audio Super-resolution at Scale

September 13, 2023
作者: Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley
cs.AI

摘要

音频超分辨率是一项基础任务,用于为低分辨率音频预测高频成分,从而增强数字应用中的音频质量。先前的方法存在一些局限,比如音频类型的范围有限(如音乐、语音),以及能处理的特定带宽设置有限(如4kHz至8kHz)。本文介绍了一种基于扩散的生成模型,名为AudioSR,能够对包括音效、音乐和语音在内的多种音频类型执行稳健的音频超分辨率。具体而言,AudioSR能够将带宽范围在2kHz至16kHz之间的任何输入音频信号上采样到带宽为24kHz、采样率为48kHz的高分辨率音频信号。在各种音频超分辨率基准测试上进行了广泛客观评估,展示了所提出模型取得的强大结果。此外,我们的主观评估显示,AudioSR可作为即插即用模块,提升各种音频生成模型(包括AudioLDM、Fastspeech2和MusicGen)的生成质量。我们的代码和演示可在https://audioldm.github.io/audiosr获取。
English
Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, speech) and specific bandwidth settings they can handle (e.g., 4kHz to 8kHz). In this paper, we introduce a diffusion-based generative model, AudioSR, that is capable of performing robust audio super-resolution on versatile audio types, including sound effects, music, and speech. Specifically, AudioSR can upsample any input audio signal within the bandwidth range of 2kHz to 16kHz to a high-resolution audio signal at 24kHz bandwidth with a sampling rate of 48kHz. Extensive objective evaluation on various audio super-resolution benchmarks demonstrates the strong result achieved by the proposed model. In addition, our subjective evaluation shows that AudioSR can acts as a plug-and-play module to enhance the generation quality of a wide range of audio generative models, including AudioLDM, Fastspeech2, and MusicGen. Our code and demo are available at https://audioldm.github.io/audiosr.
PDF285December 15, 2024