有機的インタラクションからの学習によるオープン言語モデルの改善

要旨

BlenderBot 3xを紹介します。これは、会話モデルBlenderBot 3のアップデート版であり、システムの利用者から収集された有機的な会話データとフィードバックデータを用いてトレーニングされ、そのスキルと安全性の両方を向上させています。研究コミュニティによるさらなる進展を促すため、参加者の匿名化されたインタラクションデータを公開しています。有機的なデータを用いたモデルのトレーニングは困難を伴います。なぜなら、「実世界」での人々とのインタラクションには、高品質な会話やフィードバックだけでなく、敵対的で有害な行動も含まれるからです。私たちは、有益な教師から学びつつ、モデルを不適切または有害な応答に誘導しようとする人々から学ぶことを避ける技術を研究しています。BlenderBot 3xは、BlenderBot 3と比較して会話において好まれるだけでなく、困難な状況でもより安全な応答を生成することが示されています。現在のモデルはまだ完璧とは言えませんが、本研究で探求した技術を継続的に使用することで、さらなる改善が可能であると信じています。

English

We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with organic data is challenging because interactions with people "in the wild" include both high quality conversations and feedback, as well as adversarial and toxic behavior. We study techniques that enable learning from helpful teachers while avoiding learning from people who are trying to trick the model into unhelpful or toxic responses. BlenderBot 3x is both preferred in conversation to BlenderBot 3, and is shown to produce safer responses in challenging situations. While our current models are still far from perfect, we believe further improvement can be achieved by continued use of the techniques explored in this work.

有機的インタラクションからの学習によるオープン言語モデルの改善

Improving Open Language Models by Learning from Organic Interactions

要旨

Support