ChatPaper.aiChatPaper

MagiCodec:面向高保真重建与生成的简易掩码高斯注入编解码器

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation

May 31, 2025
作者: Yakun Song, Jiawei Chen, Xiaobin Zhuang, Chenpeng Du, Ziyang Ma, Jian Wu, Jian Cong, Dongya Jia, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen
cs.AI

摘要

神经音频编解码器在将原始音频波形高效映射为离散符号表示方面取得了显著进展,这为当代音频生成模型奠定了基础。然而,现有的大多数编解码器主要针对重建质量进行优化,往往以牺牲编码符号的下游可建模性为代价。为克服这一瓶颈,我们引入了MagiCodec,一种基于单层流式Transformer的新型音频编解码器。MagiCodec设计了一个多阶段训练流程,融入了高斯噪声注入和潜在正则化,旨在增强生成代码的语义表达能力的同时保持高重建保真度。我们从频域角度分析了噪声注入的效果,证明了其在衰减高频成分和促进鲁棒符号化方面的有效性。广泛的实验评估表明,MagiCodec在重建质量和下游任务上均超越了现有最先进的编解码器。值得注意的是,MagiCodec生成的符号呈现出类似自然语言的Zipf分布,从而提高了与基于语言模型的生成架构的兼容性。代码和预训练模型可在https://github.com/Ereboas/MagiCodec获取。
English
Neural audio codecs have made significant strides in efficiently mapping raw audio waveforms into discrete token representations, which are foundational for contemporary audio generative models. However, most existing codecs are optimized primarily for reconstruction quality, often at the expense of the downstream modelability of the encoded tokens. Motivated by the need to overcome this bottleneck, we introduce MagiCodec, a novel single-layer, streaming Transformer-based audio codec. MagiCodec is designed with a multistage training pipeline that incorporates Gaussian noise injection and latent regularization, explicitly targeting the enhancement of semantic expressiveness in the generated codes while preserving high reconstruction fidelity. We analytically derive the effect of noise injection in the frequency domain, demonstrating its efficacy in attenuating high-frequency components and fostering robust tokenization. Extensive experimental evaluations show that MagiCodec surpasses state-of-the-art codecs in both reconstruction quality and downstream tasks. Notably, the tokens produced by MagiCodec exhibit Zipf-like distributions, as observed in natural languages, thereby improving compatibility with language-model-based generative architectures. The code and pre-trained models are available at https://github.com/Ereboas/MagiCodec.

Summary

AI-Generated Summary

PDF22June 3, 2025