大型语言模型作为零-shot 对话状态跟踪器通过函数调用。
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
February 16, 2024
作者: Zekun Li, Zhiyu Zoey Chen, Mike Ross, Patrick Huber, Seungwhan Moon, Zhaojiang Lin, Xin Luna Dong, Adithya Sagar, Xifeng Yan, Paul A. Crook
cs.AI
摘要
大型语言模型(LLMs)在会话系统中越来越普遍,这是因为它们在一般情境中具有先进的理解和生成能力。然而,在需要不仅生成响应还要在特定任务和领域内有效跟踪对话状态(DST)的任务导向对话(TOD)中,它们的有效性仍然不尽人意。在这项工作中,我们提出了一种新颖的FnCTOD方法,通过函数调用来解决LLMs中的DST。这种方法改进了零-shot DST,使其能够适应各种领域,而无需进行大量数据收集或模型调整。我们的实验结果表明,我们的方法在使用开源模型和专有LLMs时均取得了出色的性能:通过上下文提示,它使各种7B或13B参数模型能够超越ChatGPT之前的最先进技术水平(SOTA),并提高ChatGPT的性能,超过SOTA 5.6%的平均JGA。GPT-3.5和GPT-4的单独模型结果分别提高了4.8%和14%。我们还展示,通过在一小组多样化的任务导向对话上进行微调,我们可以为中等规模的模型,特别是13B参数的LLaMA2-Chat模型,提供函数调用功能和DST性能,这与ChatGPT相当,同时保持其聊天功能。我们计划开源实验代码和模型。
English
Large language models (LLMs) are increasingly prevalent in conversational
systems due to their advanced understanding and generative capabilities in
general contexts. However, their effectiveness in task-oriented dialogues
(TOD), which requires not only response generation but also effective dialogue
state tracking (DST) within specific tasks and domains, remains less
satisfying. In this work, we propose a novel approach FnCTOD for solving DST
with LLMs through function calling. This method improves zero-shot DST,
allowing adaptation to diverse domains without extensive data collection or
model tuning. Our experimental results demonstrate that our approach achieves
exceptional performance with both modestly sized open-source and also
proprietary LLMs: with in-context prompting it enables various 7B or 13B
parameter models to surpass the previous state-of-the-art (SOTA) achieved by
ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% Avg. JGA.
Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%,
respectively. We also show that by fine-tuning on a small collection of diverse
task-oriented dialogues, we can equip modestly sized models, specifically a 13B
parameter LLaMA2-Chat model, with function-calling capabilities and DST
performance comparable to ChatGPT while maintaining their chat capabilities. We
plan to open-source experimental code and model.