FLAME:面向大語言模型的事實感知對齊
FLAME: Factuality-Aware Alignment for Large Language Models
May 2, 2024
作者: Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen
cs.AI
摘要
對齊是一種標準程序,旨在微調預訓練的大型語言模型(LLMs),使其能遵循自然語言指令並作為實用的人工智慧助手。然而我們觀察到,傳統對齊過程不僅未能提升LLMs的事實準確性,反而經常導致更多虛假事實的生成(即幻覺現象)。本文透過系統性分析兩個對齊階段——監督式微調(SFT)與強化學習(RL)中導致幻覺的關鍵因素,探討如何使LLM對齊過程更具事實性。具體而言,我們發現對LLM進行新知識或陌生文本的訓練會加劇幻覺產生,這使得SFT階段在訓練人類標註數據(可能包含模型未接觸過的內容)時降低事實性。此外,標準RL使用的獎勵函數也會誘發幻覺,因為其引導LLM對多樣化指令生成更「有用」的回應,往往偏好更長篇且詳盡的答案。基於這些觀察,我們提出「事實感知對齊」框架,包含透過直接偏好優化實現的事實感知SFT與事實感知RL。實驗結果表明,我們提出的事實感知對齊方法能引導LLMs在保持指令遵循能力的同時,輸出更具事實性的回應。
English
Alignment is a standard procedure to fine-tune pre-trained large language
models (LLMs) to follow natural language instructions and serve as helpful AI
assistants. We have observed, however, that the conventional alignment process
fails to enhance the factual accuracy of LLMs, and often leads to the
generation of more false facts (i.e. hallucination). In this paper, we study
how to make the LLM alignment process more factual, by first identifying
factors that lead to hallucination in both alignment steps:\ supervised
fine-tuning (SFT) and reinforcement learning (RL). In particular, we find that
training the LLM on new knowledge or unfamiliar texts can encourage
hallucination. This makes SFT less factual as it trains on human labeled data
that may be novel to the LLM. Furthermore, reward functions used in standard RL
can also encourage hallucination, because it guides the LLM to provide more
helpful responses on a diverse set of instructions, often preferring longer and
more detailed responses. Based on these observations, we propose
factuality-aware alignment, comprised of factuality-aware SFT and
factuality-aware RL through direct preference optimization. Experiments show
that our proposed factuality-aware alignment guides LLMs to output more factual
responses while maintaining instruction-following capability.