ChatPaper.aiChatPaper

SonicMaster:迈向可控的一体化音乐修复与母带处理

SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

August 5, 2025
作者: Jan Melechovsky, Ambuj Mehrish, Dorien Herremans
cs.AI

摘要

音乐录音常面临音频质量问题,如过度混响、失真、削波、音调失衡以及立体声场狭窄等,尤其是在非专业环境中,缺乏专用设备或专业知识时更为突出。这些问题通常需借助独立的专业工具和手动调整来修正。本文介绍SonicMaster,首个面向音乐修复与母带处理的统一生成模型,它通过文本控制广泛应对各类音频瑕疵。SonicMaster可根据自然语言指令进行针对性增强,或自动运行以执行通用修复。为训练此模型,我们构建了SonicMaster数据集,通过模拟五种增强类别(均衡、动态、混响、振幅及立体声)下的十九种退化函数,生成了大量退化与高质量音轨配对的数据集。我们的方法采用流匹配生成训练范式,学习一种音频转换,将退化的输入映射至其经文本提示引导的清洁、母带处理版本。客观音频质量指标显示,SonicMaster在所有瑕疵类别上均显著提升了音质。此外,主观听感测试证实,听众更偏爱SonicMaster增强后的输出,而非原始退化音频,凸显了我们统一方法的有效性。
English
Music recordings often suffer from audio quality issues such as excessive reverberation, distortion, clipping, tonal imbalances, and a narrowed stereo image, especially when created in non-professional settings without specialized equipment or expertise. These problems are typically corrected using separate specialized tools and manual adjustments. In this paper, we introduce SonicMaster, the first unified generative model for music restoration and mastering that addresses a broad spectrum of audio artifacts with text-based control. SonicMaster is conditioned on natural language instructions to apply targeted enhancements, or can operate in an automatic mode for general restoration. To train this model, we construct the SonicMaster dataset, a large dataset of paired degraded and high-quality tracks by simulating common degradation types with nineteen degradation functions belonging to five enhancements groups: equalization, dynamics, reverb, amplitude, and stereo. Our approach leverages a flow-matching generative training paradigm to learn an audio transformation that maps degraded inputs to their cleaned, mastered versions guided by text prompts. Objective audio quality metrics demonstrate that SonicMaster significantly improves sound quality across all artifact categories. Furthermore, subjective listening tests confirm that listeners prefer SonicMaster's enhanced outputs over the original degraded audio, highlighting the effectiveness of our unified approach.
PDF12August 7, 2025