사실성 향상을 위한 언어 모델 미세 조정

초록

대규모 사전 학습 언어 모델(LLM)의 유창함과 창의성은 전통적인 검색 엔진을 대체할 정도로 널리 사용되게 만들었습니다. 그러나 언어 모델은 사실적으로 들리지만 사실과 다른 주장을 하는 경향이 있으며, 이를 '환각(hallucination)'이라고 부릅니다. 이러한 오류는 의도치 않게 잘못된 정보를 확산시키거나 유해한 오해를 영속화할 수 있습니다. 또한, 모델 응답을 수동으로 사실 확인하는 작업은 시간이 많이 소요되어 인간이 제공하는 사실성 레이블을 얻는 데 비용이 많이 듭니다. 본 연구에서는 인간 레이블링 없이 언어 모델을 더 사실적으로 미세 조정하며, 기존 연구보다 더 개방형 생성 설정을 목표로 합니다. 이를 위해 최근 NLP 분야의 두 가지 주요 혁신을 활용합니다. 첫째, 여러 최근 연구에서는 외부 지식 베이스와의 일관성을 측정하거나 단순히 대형 모델의 신뢰도 점수를 통해 개방형 텍스트의 사실성을 판단하는 방법을 제안했습니다. 둘째, 직접 선호 최적화(direct preference optimization) 알고리즘은 지도 학습 모방 외의 목표에 대해 언어 모델을 간단히 미세 조정할 수 있도록 해줍니다. 이를 위해 가능한 모델 응답에 대한 선호 순위를 사용합니다. 우리는 기존 검색 시스템이나 우리가 제안한 새로운 검색 없는 접근 방식을 통해 자동으로 생성된 사실성 선호 순위로부터 학습함으로써, RLHF(Reinforcement Learning from Human Feedback)나 사실성을 목표로 한 디코딩 전략과 비교했을 때 Llama-2의 보류된 주제에 대한 사실성(생성된 주장 중 정확한 비율)을 크게 향상시킴을 보여줍니다. 7B 규모에서 Llama-2-chat과 비교했을 때, 전기 생성 시 58%, 의학 질문 응답 시 40%의 사실 오류율 감소를 관찰했습니다.

English

The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively.

사실성 향상을 위한 언어 모델 미세 조정

Fine-tuning Language Models for Factuality

초록

Support