PodAgent：一個全面的播客生成框架

摘要

现有的自动音频生成方法在有效生成类似播客的音频节目方面面临挑战，主要难点在于深度内容生成以及恰当且富有表现力的语音生成。本文提出了PodAgent，一个用于创建音频节目的综合框架。PodAgent通过以下方式实现这一目标：1) 设计了一个主持人-嘉宾-撰稿人多智能体协作系统，以生成信息丰富的主题讨论内容；2) 构建了一个语音池，用于实现合适的语音角色匹配；3) 利用LLM增强的语音合成方法，生成富有表现力的对话语音。鉴于缺乏针对类似播客音频生成的标准评估准则，我们开发了全面的评估指南，以有效评估模型的性能。实验结果表明，PodAgent在主题讨论对话内容生成方面显著优于直接使用GPT-4生成的结果，实现了87.4%的语音匹配准确率，并通过LLM引导的合成方法生成了更具表现力的语音。演示页面：https://podcast-agent.github.io/demo/。源代码：https://github.com/yujxx/PodAgent。

English

Existing Existing automatic audio generation methods struggle to generate podcast-like audio programs effectively. The key challenges lie in in-depth content generation, appropriate and expressive voice production. This paper proposed PodAgent, a comprehensive framework for creating audio programs. PodAgent 1) generates informative topic-discussion content by designing a Host-Guest-Writer multi-agent collaboration system, 2) builds a voice pool for suitable voice-role matching and 3) utilizes LLM-enhanced speech synthesis method to generate expressive conversational speech. Given the absence of standardized evaluation criteria for podcast-like audio generation, we developed comprehensive assessment guidelines to effectively evaluate the model's performance. Experimental results demonstrate PodAgent's effectiveness, significantly surpassing direct GPT-4 generation in topic-discussion dialogue content, achieving an 87.4% voice-matching accuracy, and producing more expressive speech through LLM-guided synthesis. Demo page: https://podcast-agent.github.io/demo/. Source code: https://github.com/yujxx/PodAgent.