대화 시스템에서 배포 데이터의 암묵적 피드백 활용

초록

우리는 추가적인 주석 없이 사용자와 배포된 모델 간의 자연스러운 대화로부터 학습하여 소셜 대화 에이전트를 개선하는 방법을 연구합니다. 기계 생성 발화의 품질을 암묵적으로 측정하기 위해, 수집된 대화 에피소드에서 사용자 응답의 길이, 감정 및 미래 인간 발화의 반응과 같은 신호를 활용합니다. 우리의 실험은 BlenderBot(Xu 외, 2023)에서 공개된 배포 데이터를 사용합니다. 인간 평가 결과, 새로운 모델이 기준 응답 대비 개선된 성능을 보였으나, 일부 대리 신호가 바람직하지 않은 특성을 가진 생성물을 더 많이 유발할 수도 있음을 발견했습니다. 예를 들어, 대화 길이를 최적화하면 기준 대비 논란의 여지가 있거나 불친절한 생성물이 더 많아질 수 있는 반면, 긍정적인 감정이나 반응을 최적화하면 이러한 행동이 감소할 수 있습니다.

English

We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployment data from BlenderBot (Xu et al., 2023). Human evaluation indicates improvements in our new models over baseline responses; however, we find that some proxy signals can lead to more generations with undesirable properties as well. For example, optimizing for conversation length can lead to more controversial or unfriendly generations compared to the baseline, whereas optimizing for positive sentiment or reaction can decrease these behaviors.

대화 시스템에서 배포 데이터의 암묵적 피드백 활용

Leveraging Implicit Feedback from Deployment Data in Dialogue

초록

Support