FLAME：針對大型語言模型的事實感知對齊

摘要

對齊是一種標準程序，用於微調預訓練的大型語言模型（LLMs），以遵循自然語言指令並作為有用的人工智能助手。然而，我們觀察到，傳統的對齊過程未能增強LLMs的事實準確性，並且常常導致生成更多虛假事實（即幻覺）。在本文中，我們研究如何使LLM對齊過程更具事實性，首先識別導致兩個對齊步驟中幻覺的因素：監督微調（SFT）和強化學習（RL）。特別是，我們發現在LLM上訓練新知識或不熟悉的文本可能會鼓勵幻覺。這使得SFT不夠事實性，因為它在可能對LLM新奇的人類標記數據上進行訓練。此外，標準RL中使用的獎勵函數也可能鼓勵幻覺，因為它引導LLM對多樣化指令提供更有幫助的回應，通常更偏好更長和更詳細的回應。基於這些觀察，我們提出了具有事實性意識的對齊，包括具有事實性意識的SFT和通過直接偏好優化的具有事實性意識的RL。實驗表明，我們提出的具有事實性意識的對齊引導LLMs輸出更具事實性的回應，同時保持遵循指令的能力。

English

Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e. hallucination). In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps:\ supervised fine-tuning (SFT) and reinforcement learning (RL). In particular, we find that training the LLM on new knowledge or unfamiliar texts can encourage hallucination. This makes SFT less factual as it trains on human labeled data that may be novel to the LLM. Furthermore, reward functions used in standard RL can also encourage hallucination, because it guides the LLM to provide more helpful responses on a diverse set of instructions, often preferring longer and more detailed responses. Based on these observations, we propose factuality-aware alignment, comprised of factuality-aware SFT and factuality-aware RL through direct preference optimization. Experiments show that our proposed factuality-aware alignment guides LLMs to output more factual responses while maintaining instruction-following capability.

FLAME：針對大型語言模型的事實感知對齊

FLAME: Factuality-Aware Alignment for Large Language Models

摘要

Support