DialSim：用于评估对话系统长期对话理解能力的实时模拟器

摘要

最近大型语言模型（LLMs）的进展显著增强了会话代理的能力，使它们适用于各个领域（例如教育）。尽管取得了进展，但对代理的评估经常忽视了真实世界对话的复杂性，如实时互动、多方对话和延伸的语境依赖。为了弥合这一差距，我们引入了 DialSim，一个实时对话模拟器。在这个模拟器中，一个代理被分配成为流行电视节目中的角色，需要利用过去的对话信息回答即兴问题，并区分已知和未知信息。DialSim 的关键特点包括评估代理在合理时间限制内作出回应的能力，处理长期多方对话，并管理对抗设置（例如交换角色名称）以挑战代理对预训练知识的依赖。我们利用这个模拟器来评估最新的会话代理并分析它们的局限性。我们的实验突出了这些代理的优势和劣势，为未来改进会话人工智能领域提供了宝贵的见解。DialSim 可在 https://github.com/jiho283/Simulator 获取。

English

Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge this gap, we introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent's ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent's reliance on pre-trained knowledge. We utilized this simulator to evaluate the latest conversational agents and analyze their limitations. Our experiments highlight both the strengths and weaknesses of these agents, providing valuable insights for future improvements in the field of conversational AI. DialSim is available at https://github.com/jiho283/Simulator.

DialSim：用于评估对话系统长期对话理解能力的实时模拟器

DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents

摘要

Support