ChatPaper.aiChatPaper

FunAudioLLM:用於人類與LLM之間自然互動的語音理解和生成基礎模型

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

July 4, 2024
作者: Tongyi SpeechTeam
cs.AI

摘要

本報告介紹了FunAudioLLM,這是一個旨在增強人類與大型語言模型(LLMs)之間自然語音互動的模型系列。其核心包括兩個創新模型:SenseVoice,負責多語音識別、情感識別和音頻事件檢測;以及CosyVoice,可控制多種語言、音色、說話風格和說話者身份,促進自然語音生成。 SenseVoice-Small 提供了極低延遲的5種語言語音識別(ASR),而 SenseVoice-Large 支持50多種語言的高精度 ASR,CosyVoice 則擅長多語音生成、零樣本上下文學習、跨語言語音克隆和遵循指示的能力。 與 SenseVoice 和 CosyVoice 相關的模型已在 Modelscope 和 Huggingface 上進行了開源,相應的訓練、推斷和微調代碼也在 GitHub 上釋出。通過將這些模型與 LLMs 整合,FunAudioLLM 實現了語音到語音翻譯、情感語音聊天、互動式播客和生動的有聲書敘述等應用,從而推動了語音互動技術的界限。演示可在 https://fun-audio-llm.github.io 查看,代碼可在 https://github.com/FunAudioLLM 存取。
English
This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. SenseVoice-Small delivers exceptionally low-latency ASR for 5 languages, and SenseVoice-Large supports high-precision ASR for over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot in-context learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub. By integrating these models with LLMs, FunAudioLLM enables applications such as speech-to-speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration, thereby pushing the boundaries of voice interaction technology. Demos are available at https://fun-audio-llm.github.io, and the code can be accessed at https://github.com/FunAudioLLM.

Summary

AI-Generated Summary

PDF401November 28, 2024