FLAME:面向大型语言模型的事实感知对齐
FLAME: Factuality-Aware Alignment for Large Language Models
May 2, 2024
作者: Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen
cs.AI
摘要
对齐是一种标准程序,用于微调预训练的大型语言模型(LLMs),使其遵循自然语言指令并充当有用的人工智能助手。然而,我们观察到,传统的对齐过程未能提高LLMs的事实准确性,通常导致生成更多虚假事实(即幻觉)。在本文中,我们研究如何使LLM对齐过程更具事实性,首先识别导致在两个对齐步骤中出现幻觉的因素:监督微调(SFT)和强化学习(RL)。特别地,我们发现,训练LLM使用新知识或不熟悉的文本可能会鼓励幻觉。这使得SFT不够事实性,因为它在可能对LLM新颖的人类标记数据上进行训练。此外,标准RL中使用的奖励函数也可能鼓励幻觉,因为它引导LLM在多样的指令集上提供更有帮助的响应,通常更偏好更长、更详细的响应。基于这些观察,我们提出了具有事实性意识的对齐,包括具有事实性意识的SFT和通过直接偏好优化的具有事实性意识的RL。实验证明,我们提出的具有事实性意识的对齐引导LLMs输出更具事实性的响应,同时保持遵循指令的能力。
English
Alignment is a standard procedure to fine-tune pre-trained large language
models (LLMs) to follow natural language instructions and serve as helpful AI
assistants. We have observed, however, that the conventional alignment process
fails to enhance the factual accuracy of LLMs, and often leads to the
generation of more false facts (i.e. hallucination). In this paper, we study
how to make the LLM alignment process more factual, by first identifying
factors that lead to hallucination in both alignment steps:\ supervised
fine-tuning (SFT) and reinforcement learning (RL). In particular, we find that
training the LLM on new knowledge or unfamiliar texts can encourage
hallucination. This makes SFT less factual as it trains on human labeled data
that may be novel to the LLM. Furthermore, reward functions used in standard RL
can also encourage hallucination, because it guides the LLM to provide more
helpful responses on a diverse set of instructions, often preferring longer and
more detailed responses. Based on these observations, we propose
factuality-aware alignment, comprised of factuality-aware SFT and
factuality-aware RL through direct preference optimization. Experiments show
that our proposed factuality-aware alignment guides LLMs to output more factual
responses while maintaining instruction-following capability.Summary
AI-Generated Summary