Amphion:一个开源的音频、音乐和语音生成工具包
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
December 15, 2023
作者: Xueyao Zhang, Liumeng Xue, Yuancheng Wang, Yicheng Gu, Xi Chen, Zihao Fang, Haopeng Chen, Lexiao Zou, Chaoren Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu
cs.AI
摘要
Amphion是一个用于音频、音乐和语音生成的工具包。它的目的是支持可重现的研究,并帮助初级研究人员和工程师开始从事音频、音乐和语音生成研究和开发。Amphion提供了一个独特的功能:经典模型或架构的可视化。我们认为这些可视化对于希望更好地理解模型的初级研究人员和工程师是有益的。Amphion的北极星目标是提供一个研究将任何输入转换为通用音频的平台。Amphion旨在支持个体生成任务。除了特定的生成任务,Amphion还包括几种声码器和评估指标。声码器是产生高质量音频信号的重要模块,而评估指标对于确保生成任务中的一致指标至关重要。在本文中,我们提供了Amphion的高层概述。
English
Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is
to support reproducible research and help junior researchers and engineers get
started in the field of audio, music, and speech generation research and
development. Amphion offers a unique feature: visualizations of classic models
or architectures. We believe that these visualizations are beneficial for
junior researchers and engineers who wish to gain a better understanding of the
model. The North-Star objective of Amphion is to offer a platform for studying
the conversion of any inputs into general audio. Amphion is designed to support
individual generation tasks. In addition to the specific generation tasks,
Amphion also includes several vocoders and evaluation metrics. A vocoder is an
important module for producing high-quality audio signals, while evaluation
metrics are critical for ensuring consistent metrics in generation tasks. In
this paper, we provide a high-level overview of Amphion.