在對話中利用部署數據中的隱式反饋
Leveraging Implicit Feedback from Deployment Data in Dialogue
July 26, 2023
作者: Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston
cs.AI
摘要
我們研究通過從用戶與已部署模型之間的自然對話中學習,來改善社交對話代理的方法,而無需額外的標註。為了隱含地衡量機器生成的發言的質量,我們利用信號,如用戶回應長度、情感以及在收集的對話片段中未來人類發言的反應。我們的實驗使用了從BlenderBot(Xu等人,2023)公開發布的部署數據。人類評估表明,我們的新模型在基準回應上有所改進;然而,我們發現一些代理信號也可能導致更多具有不良特性的生成。例如,優化對話長度可能會導致與基準相比更具爭議性或不友好的生成,而優化正面情感或反應可能會減少這些行為。
English
We study improving social conversational agents by learning from natural
dialogue between users and a deployed model, without extra annotations. To
implicitly measure the quality of a machine-generated utterance, we leverage
signals like user response length, sentiment and reaction of the future human
utterances in the collected dialogue episodes. Our experiments use the publicly
released deployment data from BlenderBot (Xu et al., 2023). Human evaluation
indicates improvements in our new models over baseline responses; however, we
find that some proxy signals can lead to more generations with undesirable
properties as well. For example, optimizing for conversation length can lead to
more controversial or unfriendly generations compared to the baseline, whereas
optimizing for positive sentiment or reaction can decrease these behaviors.Summary
AI-Generated Summary