Amphion：一个开源的音频、音乐和语音生成工具包

摘要

Amphion是一个用于音频、音乐和语音生成的工具包。它的目的是支持可重现的研究，并帮助初级研究人员和工程师开始从事音频、音乐和语音生成研究和开发。Amphion提供了一个独特的功能：经典模型或架构的可视化。我们认为这些可视化对于希望更好地理解模型的初级研究人员和工程师是有益的。Amphion的北极星目标是提供一个研究将任何输入转换为通用音频的平台。Amphion旨在支持个体生成任务。除了特定的生成任务，Amphion还包括几种声码器和评估指标。声码器是产生高质量音频信号的重要模块，而评估指标对于确保生成任务中的一致指标至关重要。在本文中，我们提供了Amphion的高层概述。

English

Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development. Amphion offers a unique feature: visualizations of classic models or architectures. We believe that these visualizations are beneficial for junior researchers and engineers who wish to gain a better understanding of the model. The North-Star objective of Amphion is to offer a platform for studying the conversion of any inputs into general audio. Amphion is designed to support individual generation tasks. In addition to the specific generation tasks, Amphion also includes several vocoders and evaluation metrics. A vocoder is an important module for producing high-quality audio signals, while evaluation metrics are critical for ensuring consistent metrics in generation tasks. In this paper, we provide a high-level overview of Amphion.

Amphion：一个开源的音频、音乐和语音生成工具包

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

摘要

Support