AudioSR: Veelzijdige Audio Super-resolutie op Schaal

Samenvatting

Audio super-resolutie is een fundamentele taak die hoogfrequente componenten voorspelt voor audio met een lage resolutie, waardoor de audiokwaliteit in digitale toepassingen wordt verbeterd. Eerdere methoden hebben beperkingen, zoals het beperkte bereik van audiotypes (bijvoorbeeld muziek, spraak) en specifieke bandbreedte-instellingen die ze kunnen verwerken (bijvoorbeeld 4kHz tot 8kHz). In dit artikel introduceren we een op diffusie gebaseerd generatief model, AudioSR, dat in staat is om robuuste audio super-resolutie uit te voeren op diverse audiotypes, waaronder geluidseffecten, muziek en spraak. Specifiek kan AudioSR elk ingangsaudiosignaal binnen het bandbreedtebereik van 2kHz tot 16kHz opschalen naar een hoogwaardig audiosignaal met een bandbreedte van 24kHz en een samplefrequentie van 48kHz. Uitgebreide objectieve evaluatie op verschillende audio super-resolutie benchmarks toont de sterke resultaten die door het voorgestelde model worden behaald. Daarnaast laat onze subjectieve evaluatie zien dat AudioSR kan fungeren als een plug-and-play module om de generatiekwaliteit van een breed scala aan audio generatieve modellen te verbeteren, waaronder AudioLDM, Fastspeech2 en MusicGen. Onze code en demo zijn beschikbaar op https://audioldm.github.io/audiosr.

English

Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, speech) and specific bandwidth settings they can handle (e.g., 4kHz to 8kHz). In this paper, we introduce a diffusion-based generative model, AudioSR, that is capable of performing robust audio super-resolution on versatile audio types, including sound effects, music, and speech. Specifically, AudioSR can upsample any input audio signal within the bandwidth range of 2kHz to 16kHz to a high-resolution audio signal at 24kHz bandwidth with a sampling rate of 48kHz. Extensive objective evaluation on various audio super-resolution benchmarks demonstrates the strong result achieved by the proposed model. In addition, our subjective evaluation shows that AudioSR can acts as a plug-and-play module to enhance the generation quality of a wide range of audio generative models, including AudioLDM, Fastspeech2, and MusicGen. Our code and demo are available at https://audioldm.github.io/audiosr.

AudioSR: Veelzijdige Audio Super-resolutie op Schaal

AudioSR: Versatile Audio Super-resolution at Scale

Samenvatting

Support