系统级自然语言反馈

摘要

自然语言（NL）反馈包含丰富的关于用户体验的信息。现有研究侧重于实例级方法，其中反馈用于优化特定示例，而忽略了其系统范围的应用。本文提出了一个通用框架，用于解锁NL反馈的系统级应用。我们展示了如何利用反馈来形式化人在环环节中的系统级设计决策，以便生成更好的模型。具体而言，通过以下方式实现：（i）为任务设计度量标准；以及（ii）为优化模型响应设计语言模型提示。我们进行了两个案例研究，以改进搜索查询生成和对话响应生成，展示了系统级反馈的有效性。我们展示了系统级反馈和实例级反馈的结合带来了进一步的收益，并且人类撰写的实例级反馈比GPT-3.5撰写的反馈产生了更具基础性的改进，强调了人类反馈对于构建系统的重要性。

English

Natural language (NL) feedback contains rich information about the user experience. Existing studies focus on an instance-level approach, where feedback is used to refine specific examples, disregarding its system-wide application. This paper proposes a general framework for unlocking the system-level use of NL feedback. We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process -- in order to produce better models. In particular this is done through: (i) metric design for tasks; and (ii) language model prompt design for refining model responses. We conduct two case studies of this approach for improving search query generation and dialog response generation, demonstrating the effectiveness of the use of system-level feedback. We show the combination of system-level feedback and instance-level feedback brings further gains, and that human written instance-level feedback results in more grounded refinements than GPT-3.5 written ones, underlying the importance of human feedback for building systems.

系统级自然语言反馈

System-Level Natural Language Feedback

摘要

Support