DialSim:用于评估对话系统长期对话理解能力的实时模拟器
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents
June 19, 2024
作者: Jiho Kim, Woosog Chay, Hyeonji Hwang, Daeun Kyung, Hyunseung Chung, Eunbyeol Cho, Yohan Jo, Edward Choi
cs.AI
摘要
最近大型语言模型(LLMs)的进展显著增强了会话代理的能力,使它们适用于各个领域(例如教育)。尽管取得了进展,但对代理的评估经常忽视了真实世界对话的复杂性,如实时互动、多方对话和延伸的语境依赖。为了弥合这一差距,我们引入了 DialSim,一个实时对话模拟器。在这个模拟器中,一个代理被分配成为流行电视节目中的角色,需要利用过去的对话信息回答即兴问题,并区分已知和未知信息。DialSim 的关键特点包括评估代理在合理时间限制内作出回应的能力,处理长期多方对话,并管理对抗设置(例如交换角色名称)以挑战代理对预训练知识的依赖。我们利用这个模拟器来评估最新的会话代理并分析它们的局限性。我们的实验突出了这些代理的优势和劣势,为未来改进会话人工智能领域提供了宝贵的见解。DialSim 可在 https://github.com/jiho283/Simulator 获取。
English
Recent advancements in Large Language Models (LLMs) have significantly
enhanced the capabilities of conversational agents, making them applicable to
various fields (e.g., education). Despite their progress, the evaluation of the
agents often overlooks the complexities of real-world conversations, such as
real-time interactions, multi-party dialogues, and extended contextual
dependencies. To bridge this gap, we introduce DialSim, a real-time dialogue
simulator. In this simulator, an agent is assigned the role of a character from
popular TV shows, requiring it to respond to spontaneous questions using past
dialogue information and to distinguish between known and unknown information.
Key features of DialSim include evaluating the agent's ability to respond
within a reasonable time limit, handling long-term multi-party dialogues, and
managing adversarial settings (e.g., swap character names) to challenge the
agent's reliance on pre-trained knowledge. We utilized this simulator to
evaluate the latest conversational agents and analyze their limitations. Our
experiments highlight both the strengths and weaknesses of these agents,
providing valuable insights for future improvements in the field of
conversational AI. DialSim is available at
https://github.com/jiho283/Simulator.Summary
AI-Generated Summary