在对话中利用部署数据中的隐式反馈
Leveraging Implicit Feedback from Deployment Data in Dialogue
July 26, 2023
作者: Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston
cs.AI
摘要
我们研究通过从用户和已部署模型之间的自然对话中学习来改进社交对话代理,而无需额外的注释。为了隐式衡量机器生成话语的质量,我们利用诸如用户回复长度、情感以及在收集的对话片段中未来人类话语的反应等信号。我们的实验使用了来自BlenderBot(Xu等,2023年)的公开发布的部署数据。人类评估表明,我们的新模型在基准回复上有所改进;然而,我们发现一些代理信号也可能导致更多具有不良特性的生成。例如,优化对话长度可能导致与基准相比更具争议性或不友好的生成,而优化积极情感或反应则可以减少这些行为。
English
We study improving social conversational agents by learning from natural
dialogue between users and a deployed model, without extra annotations. To
implicitly measure the quality of a machine-generated utterance, we leverage
signals like user response length, sentiment and reaction of the future human
utterances in the collected dialogue episodes. Our experiments use the publicly
released deployment data from BlenderBot (Xu et al., 2023). Human evaluation
indicates improvements in our new models over baseline responses; however, we
find that some proxy signals can lead to more generations with undesirable
properties as well. For example, optimizing for conversation length can lead to
more controversial or unfriendly generations compared to the baseline, whereas
optimizing for positive sentiment or reaction can decrease these behaviors.Summary
AI-Generated Summary