从互动中进行回顾性学习
Retrospective Learning from Interactions
October 17, 2024
作者: Zizhao Chen, Mustafa Omer Gul, Yiwei Chen, Gloria Geng, Anne Wu, Yoav Artzi
cs.AI
摘要
大型语言模型(LLMs)与用户之间的多轮交互自然包含隐式反馈信号。如果LLM对指令的响应出乎意料,用户很可能会通过重新表达请求、表达沮丧情绪或转向另一个任务来发出信号。这些信号与任务无关,占据语言的一个相对受限制的子空间,使LLM能够识别它们,即使在实际任务上失败也是如此。这为在没有额外注释的情况下不断从交互中学习创造了一条途径。我们介绍了ReSpect,这是一种通过回顾从过去交互中学习这些信号的方法。我们将ReSpect部署在一个新的多模态交互场景中,其中人类指示LLM解决一个具有组合解空间的抽象推理任务。通过与人类进行数千次交互,我们展示了ReSpect如何逐渐将任务完成率从31%提高到82%,而无需任何外部注释。
English
Multi-turn interactions between large language models (LLMs) and users
naturally include implicit feedback signals. If an LLM responds in an
unexpected way to an instruction, the user is likely to signal it by rephrasing
the request, expressing frustration, or pivoting to an alternative task. Such
signals are task-independent and occupy a relatively constrained subspace of
language, allowing the LLM to identify them even if it fails on the actual
task. This creates an avenue for continually learning from interactions without
additional annotations. We introduce ReSpect, a method to learn from such
signals in past interactions via retrospection. We deploy ReSpect in a new
multimodal interaction scenario, where humans instruct an LLM to solve an
abstract reasoning task with a combinatorial solution space. Through thousands
of interactions with humans, we show how ReSpect gradually improves task
completion rate from 31% to 82%, all without any external annotation.Summary
AI-Generated Summary