ChatPaper.aiChatPaper

SonicMaster:邁向可控的一體化音樂修復與母帶處理

SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

August 5, 2025
作者: Jan Melechovsky, Ambuj Mehrish, Dorien Herremans
cs.AI

摘要

音樂錄音常因過度混響、失真、削波、音調失衡及立體聲像縮窄等音質問題而受損,尤其是在非專業環境下未使用專用設備或專業知識製作時。這些問題通常需借助多種專用工具及手動調整來修正。本文介紹了SonicMaster,首個針對廣泛音頻瑕疵進行修復與母帶處理的統一生成模型,並支持基於文本的控制。SonicMaster可根據自然語言指令進行定向增強,或運行於自動模式以實現通用修復。為訓練此模型,我們構建了SonicMaster數據集,這是一個大型配對數據集,通過模擬五類增強組(均衡、動態、混響、振幅及立體聲)下的十九種退化函數,生成退化與高質量音軌的對比。我們的方法採用流匹配生成訓練範式,學習一種音頻轉換,將退化輸入映射至其經文本提示引導的淨化、母帶處理版本。客觀音質指標顯示,SonicMaster在所有瑕疵類別上均顯著提升了音質。此外,主觀聽覺測試證實,聽者更偏好SonicMaster增強後的輸出而非原始退化音頻,凸顯了我們統一方法的有效性。
English
Music recordings often suffer from audio quality issues such as excessive reverberation, distortion, clipping, tonal imbalances, and a narrowed stereo image, especially when created in non-professional settings without specialized equipment or expertise. These problems are typically corrected using separate specialized tools and manual adjustments. In this paper, we introduce SonicMaster, the first unified generative model for music restoration and mastering that addresses a broad spectrum of audio artifacts with text-based control. SonicMaster is conditioned on natural language instructions to apply targeted enhancements, or can operate in an automatic mode for general restoration. To train this model, we construct the SonicMaster dataset, a large dataset of paired degraded and high-quality tracks by simulating common degradation types with nineteen degradation functions belonging to five enhancements groups: equalization, dynamics, reverb, amplitude, and stereo. Our approach leverages a flow-matching generative training paradigm to learn an audio transformation that maps degraded inputs to their cleaned, mastered versions guided by text prompts. Objective audio quality metrics demonstrate that SonicMaster significantly improves sound quality across all artifact categories. Furthermore, subjective listening tests confirm that listeners prefer SonicMaster's enhanced outputs over the original degraded audio, highlighting the effectiveness of our unified approach.
PDF12August 7, 2025