通過函數調用將大型語言模型作為零-shot 對話狀態跟踪器
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
February 16, 2024
作者: Zekun Li, Zhiyu Zoey Chen, Mike Ross, Patrick Huber, Seungwhan Moon, Zhaojiang Lin, Xin Luna Dong, Adithya Sagar, Xifeng Yan, Paul A. Crook
cs.AI
摘要
大型語言模型(LLMs)在對話系統中日益普及,因為它們在一般情境中具有先進的理解和生成能力。然而,在需要不僅生成回應還要在特定任務和領域內有效跟踪對話狀態(DST)的任務導向對話(TOD)中,它們的效果仍然不盡人意。在這項工作中,我們提出了一種新方法 FnCTOD,通過函數調用來解決LLMs中的DST。這種方法改進了零-shot DST,使其能夠適應不同領域,而無需進行大量數據收集或模型調整。我們的實驗結果表明,我們的方法在中等大小的開源和專有LLMs上均取得了優異表現:通過上下文提示,使各種7B或13B參數模型超越了ChatGPT先前達到的最新技術水平(SOTA),並提高了ChatGPT的性能,超越SOTA 5.6%的平均JGA。 GPT-3.5和GPT-4的單個模型結果分別提高了4.8%和14%。我們還展示,通過在一小部分多樣化的任務導向對話上進行微調,我們可以為中等大小的模型提供功能調用能力,特別是13B參數LLaMA2-Chat模型,其DST性能可與ChatGPT相媲美,同時保持其聊天能力。我們計劃開源實驗代碼和模型。
English
Large language models (LLMs) are increasingly prevalent in conversational
systems due to their advanced understanding and generative capabilities in
general contexts. However, their effectiveness in task-oriented dialogues
(TOD), which requires not only response generation but also effective dialogue
state tracking (DST) within specific tasks and domains, remains less
satisfying. In this work, we propose a novel approach FnCTOD for solving DST
with LLMs through function calling. This method improves zero-shot DST,
allowing adaptation to diverse domains without extensive data collection or
model tuning. Our experimental results demonstrate that our approach achieves
exceptional performance with both modestly sized open-source and also
proprietary LLMs: with in-context prompting it enables various 7B or 13B
parameter models to surpass the previous state-of-the-art (SOTA) achieved by
ChatGPT, and improves ChatGPT's performance beating the SOTA by 5.6% Avg. JGA.
Individual model results for GPT-3.5 and GPT-4 are boosted by 4.8% and 14%,
respectively. We also show that by fine-tuning on a small collection of diverse
task-oriented dialogues, we can equip modestly sized models, specifically a 13B
parameter LLaMA2-Chat model, with function-calling capabilities and DST
performance comparable to ChatGPT while maintaining their chat capabilities. We
plan to open-source experimental code and model.Summary
AI-Generated Summary