微調語言模型以提升事實性

摘要

大型預訓練語言模型（LLMs）的流暢度和創造力已導致它們被廣泛使用，有時甚至取代傳統搜索引擎。然而，語言模型容易做出令人信服但事實不準確的主張，通常被稱為「幻覺」。這些錯誤可能無意中傳播錯誤信息或有害地固化誤解。此外，對模型回應進行手動事實檢查是一個耗時的過程，使得人工事實標籤的獲取成本很高。在這項工作中，我們對語言模型進行微調，使其更具事實性，而無需人類標記，並針對比過去工作更開放的生成設置。我們利用自然語言處理（NLP）中的兩個關鍵最新創新來實現這一點。首先，一些最近的研究提出了評判開放式文本事實性的方法，通過測量與外部知識庫或僅僅大型模型的信心分數的一致性。其次，直接偏好優化算法使語言模型可以直接在非監督模仿以外的目標上進行簡單微調，使用對可能的模型回應的偏好排序。我們展示了從自動生成的事實性偏好排序中學習，通過現有檢索系統或我們的新穎無檢索方法生成，相對於針對事實性的RLHF或解碼策略，明顯提高了Llama-2在保留主題上的事實性（生成主張中正確的百分比）。在7B規模上，與Llama-2-chat相比，我們觀察到在生成傳記和回答醫學問題時，事實性錯誤率分別減少了58%和40%。

English

The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively.

微調語言模型以提升事實性

Fine-tuning Language Models for Factuality

摘要

Support