从互动中进行回顾性学习

摘要

大型语言模型（LLMs）与用户之间的多轮交互自然包含隐式反馈信号。如果LLM对指令的响应出乎意料，用户很可能会通过重新表达请求、表达沮丧情绪或转向另一个任务来发出信号。这些信号与任务无关，占据语言的一个相对受限制的子空间，使LLM能够识别它们，即使在实际任务上失败也是如此。这为在没有额外注释的情况下不断从交互中学习创造了一条途径。我们介绍了ReSpect，这是一种通过回顾从过去交互中学习这些信号的方法。我们将ReSpect部署在一个新的多模态交互场景中，其中人类指示LLM解决一个具有组合解空间的抽象推理任务。通过与人类进行数千次交互，我们展示了ReSpect如何逐渐将任务完成率从31%提高到82%，而无需任何外部注释。

English

Multi-turn interactions between large language models (LLMs) and users naturally include implicit feedback signals. If an LLM responds in an unexpected way to an instruction, the user is likely to signal it by rephrasing the request, expressing frustration, or pivoting to an alternative task. Such signals are task-independent and occupy a relatively constrained subspace of language, allowing the LLM to identify them even if it fails on the actual task. This creates an avenue for continually learning from interactions without additional annotations. We introduce ReSpect, a method to learn from such signals in past interactions via retrospection. We deploy ReSpect in a new multimodal interaction scenario, where humans instruct an LLM to solve an abstract reasoning task with a combinatorial solution space. Through thousands of interactions with humans, we show how ReSpect gradually improves task completion rate from 31% to 82%, all without any external annotation.

从互动中进行回顾性学习

Retrospective Learning from Interactions

摘要

Support