AudioSR：大規模多用途音訊超解析度

摘要

音頻超分辨率是一項基本任務，用於預測低分辨率音頻的高頻成分，從而增強數字應用中的音頻質量。先前的方法存在著一些限制，例如僅適用於特定音頻類型（例如音樂、語音）和特定頻寬範圍（例如4kHz至8kHz）。本文介紹了一種基於擴散的生成模型 AudioSR，能夠對多樣化的音頻類型進行強大的音頻超分辨率處理，包括音效、音樂和語音。具體而言，AudioSR 能夠將頻寬範圍在2kHz至16kHz之間的任何輸入音頻信號上採樣至24kHz頻寬的高分辨率音頻信號，並以48kHz的採樣率進行處理。在各種音頻超分辨率基準測試上進行了廣泛客觀評估，證明了所提出模型的優異結果。此外，我們的主觀評估顯示，AudioSR 可以作為即插即用模塊，提升各種音頻生成模型（包括 AudioLDM、Fastspeech2 和 MusicGen）的生成質量。我們的代碼和演示可在 https://audioldm.github.io/audiosr 上找到。

English

Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, speech) and specific bandwidth settings they can handle (e.g., 4kHz to 8kHz). In this paper, we introduce a diffusion-based generative model, AudioSR, that is capable of performing robust audio super-resolution on versatile audio types, including sound effects, music, and speech. Specifically, AudioSR can upsample any input audio signal within the bandwidth range of 2kHz to 16kHz to a high-resolution audio signal at 24kHz bandwidth with a sampling rate of 48kHz. Extensive objective evaluation on various audio super-resolution benchmarks demonstrates the strong result achieved by the proposed model. In addition, our subjective evaluation shows that AudioSR can acts as a plug-and-play module to enhance the generation quality of a wide range of audio generative models, including AudioLDM, Fastspeech2, and MusicGen. Our code and demo are available at https://audioldm.github.io/audiosr.

AudioSR：大規模多用途音訊超解析度

AudioSR: Versatile Audio Super-resolution at Scale

摘要

Support