事実性のための言語モデルのファインチューニング

要旨

大規模事前学習言語モデル（LLM）の流暢さと創造性により、その利用が広がり、従来の検索エンジンの代替として使用されることもある。しかし、言語モデルは説得力があるが事実に基づかない主張、いわゆる「幻覚」を生み出しやすい。これらの誤りは、誤った情報を広めたり、有害な誤解を永続させたりする可能性がある。さらに、モデルの応答を手動で事実確認するのは時間がかかるプロセスであり、人間による事実性ラベルの取得はコストが高い。本研究では、人間のラベル付けなしに、過去の研究よりもオープンエンドな生成設定を対象として、言語モデルをより事実に基づくようにファインチューニングする。これを行うために、NLPにおける2つの重要な最近のイノベーションを活用する。第一に、外部の知識ベースとの一貫性を測定するか、単に大規模モデルの信頼度スコアを測定することで、オープンエンドなテキストの事実性を判断する方法がいくつかの最近の研究で提案されている。第二に、直接選好最適化アルゴリズムにより、教師あり模倣以外の目的で、可能なモデル応答に対する選好ランキングを使用して、言語モデルのファインチューニングが容易になる。既存の検索システムまたは我々の新しい検索不要のアプローチを通じて自動生成された事実性選好ランキングから学習することで、RLHFや事実性を目的としたデコード戦略と比較して、Llama-2の事実性（生成された主張の正しい割合）が保持されたトピックにおいて大幅に向上することを示す。7Bスケールでは、Llama-2-chatと比較して、伝記の生成と医療質問への回答において、それぞれ58％と40％の事実誤り率の減少を観察した。

English

The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual fact-checking of model responses is a time-consuming process, making human factuality labels expensive to acquire. In this work, we fine-tune language models to be more factual, without human labeling and targeting more open-ended generation settings than past work. We leverage two key recent innovations in NLP to do so. First, several recent works have proposed methods for judging the factuality of open-ended text by measuring consistency with an external knowledge base or simply a large model's confidence scores. Second, the direct preference optimization algorithm enables straightforward fine-tuning of language models on objectives other than supervised imitation, using a preference ranking over possible model responses. We show that learning from automatically generated factuality preference rankings, generated either through existing retrieval systems or our novel retrieval-free approach, significantly improves the factuality (percent of generated claims that are correct) of Llama-2 on held-out topics compared with RLHF or decoding strategies targeted at factuality. At 7B scale, compared to Llama-2-chat, we observe 58% and 40% reduction in factual error rate when generating biographies and answering medical questions, respectively.

事実性のための言語モデルのファインチューニング

Fine-tuning Language Models for Factuality

要旨

Support