MagiCodec:面向高保真重建与生成的简易掩码高斯注入编解码器
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
May 31, 2025
作者: Yakun Song, Jiawei Chen, Xiaobin Zhuang, Chenpeng Du, Ziyang Ma, Jian Wu, Jian Cong, Dongya Jia, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen
cs.AI
摘要
神经音频编解码器在将原始音频波形高效映射为离散符号表示方面取得了显著进展,这为当代音频生成模型奠定了基础。然而,现有的大多数编解码器主要针对重建质量进行优化,往往以牺牲编码符号的下游可建模性为代价。为克服这一瓶颈,我们引入了MagiCodec,一种基于单层流式Transformer的新型音频编解码器。MagiCodec设计了一个多阶段训练流程,融入了高斯噪声注入和潜在正则化,旨在增强生成代码的语义表达能力的同时保持高重建保真度。我们从频域角度分析了噪声注入的效果,证明了其在衰减高频成分和促进鲁棒符号化方面的有效性。广泛的实验评估表明,MagiCodec在重建质量和下游任务上均超越了现有最先进的编解码器。值得注意的是,MagiCodec生成的符号呈现出类似自然语言的Zipf分布,从而提高了与基于语言模型的生成架构的兼容性。代码和预训练模型可在https://github.com/Ereboas/MagiCodec获取。
English
Neural audio codecs have made significant strides in efficiently mapping raw
audio waveforms into discrete token representations, which are foundational for
contemporary audio generative models. However, most existing codecs are
optimized primarily for reconstruction quality, often at the expense of the
downstream modelability of the encoded tokens. Motivated by the need to
overcome this bottleneck, we introduce MagiCodec, a novel
single-layer, streaming Transformer-based audio codec. MagiCodec is designed
with a multistage training pipeline that incorporates Gaussian noise injection
and latent regularization, explicitly targeting the enhancement of semantic
expressiveness in the generated codes while preserving high reconstruction
fidelity. We analytically derive the effect of noise injection in the frequency
domain, demonstrating its efficacy in attenuating high-frequency components and
fostering robust tokenization. Extensive experimental evaluations show that
MagiCodec surpasses state-of-the-art codecs in both reconstruction quality and
downstream tasks. Notably, the tokens produced by MagiCodec exhibit Zipf-like
distributions, as observed in natural languages, thereby improving
compatibility with language-model-based generative architectures. The code and
pre-trained models are available at https://github.com/Ereboas/MagiCodec.Summary
AI-Generated Summary