유기적 상호작용 학습을 통한 개방형 언어 모델 개선

초록

본 논문에서는 대화 모델 BlenderBot 3의 업데이트 버전인 BlenderBot 3x를 소개한다. 이 모델은 시스템 사용자로부터 수집된 자연스러운 대화 및 피드백 데이터를 활용하여 학습되었으며, 이를 통해 모델의 기술적 능력과 안전성을 동시에 개선하였다. 연구 커뮤니티의 추가 발전을 촉진하기 위해, 참여자의 개인 정보가 제거된 상호작용 데이터를 공개적으로 제공한다. 자연스러운 데이터를 활용한 모델 학습은 도전적인 과제인데, 이는 실제 환경에서의 인간 상호작용에는 고품질의 대화와 피드백뿐만 아니라 적대적이고 유해한 행동도 포함되기 때문이다. 본 연구에서는 모델을 도와주는 교사로부터는 학습하되, 모델을 속여 유해하거나 도움이 되지 않는 응답을 유도하려는 사람으로부터는 학습을 피할 수 있는 기술을 탐구하였다. BlenderBot 3x는 BlenderBot 3에 비해 대화에서 더 선호되며, 어려운 상황에서도 더 안전한 응답을 생성하는 것으로 나타났다. 현재 모델은 여전히 완벽하지 않지만, 본 연구에서 탐구한 기술을 지속적으로 활용함으로써 추가적인 개선이 가능할 것으로 기대한다.

English

We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with organic data is challenging because interactions with people "in the wild" include both high quality conversations and feedback, as well as adversarial and toxic behavior. We study techniques that enable learning from helpful teachers while avoiding learning from people who are trying to trick the model into unhelpful or toxic responses. BlenderBot 3x is both preferred in conversation to BlenderBot 3, and is shown to produce safer responses in challenging situations. While our current models are still far from perfect, we believe further improvement can be achieved by continued use of the techniques explored in this work.

유기적 상호작용 학습을 통한 개방형 언어 모델 개선

Improving Open Language Models by Learning from Organic Interactions

초록

Support