PodAgent: 팟캐스트 생성을 위한 포괄적 프레임워크

초록

기존의 자동 오디오 생성 방법들은 팟캐스트 형태의 오디오 프로그램을 효과적으로 생성하는 데 어려움을 겪고 있습니다. 주요 과제는 심층적인 콘텐츠 생성과 적절하며 표현력 있는 음성 생산에 있습니다. 본 논문은 오디오 프로그램 제작을 위한 포괄적인 프레임워크인 PodAgent를 제안합니다. PodAgent는 1) 호스트-게스트-작가 다중 에이전트 협업 시스템을 설계하여 정보성 있는 토론 콘텐츠를 생성하고, 2) 적절한 음성-역할 매칭을 위한 음성 풀을 구축하며, 3) LLM(대형 언어 모델) 강화 음성 합성 방법을 활용하여 표현력 있는 대화형 음성을 생성합니다. 팟캐스트 형태의 오디오 생성을 위한 표준화된 평가 기준이 부재함에 따라, 우리는 모델의 성능을 효과적으로 평가하기 위한 포괄적인 평가 지침을 개발했습니다. 실험 결과는 PodAgent의 효과성을 입증하며, 토론 대화 콘텐츠에서 직접 GPT-4 생성보다 월등히 뛰어난 성능을 보이고, 87.4%의 음성 매칭 정확도를 달성하며, LLM 기반 합성을 통해 더욱 표현력 있는 음성을 생성합니다. 데모 페이지: https://podcast-agent.github.io/demo/. 소스 코드: https://github.com/yujxx/PodAgent.

English

Existing Existing automatic audio generation methods struggle to generate podcast-like audio programs effectively. The key challenges lie in in-depth content generation, appropriate and expressive voice production. This paper proposed PodAgent, a comprehensive framework for creating audio programs. PodAgent 1) generates informative topic-discussion content by designing a Host-Guest-Writer multi-agent collaboration system, 2) builds a voice pool for suitable voice-role matching and 3) utilizes LLM-enhanced speech synthesis method to generate expressive conversational speech. Given the absence of standardized evaluation criteria for podcast-like audio generation, we developed comprehensive assessment guidelines to effectively evaluate the model's performance. Experimental results demonstrate PodAgent's effectiveness, significantly surpassing direct GPT-4 generation in topic-discussion dialogue content, achieving an 87.4% voice-matching accuracy, and producing more expressive speech through LLM-guided synthesis. Demo page: https://podcast-agent.github.io/demo/. Source code: https://github.com/yujxx/PodAgent.

PodAgent: 팟캐스트 생성을 위한 포괄적 프레임워크

PodAgent: A Comprehensive Framework for Podcast Generation

초록

Support