大型語言模型中的可信來源對齊
Trusted Source Alignment in Large Language Models
November 12, 2023
作者: Vasilisa Bashlovkina, Zhaobin Kuang, Riley Matthews, Edward Clifford, Yennie Jun, William W. Cohen, Simon Baumgartner
cs.AI
摘要
大型語言模型(LLMs)是在規模龐大的網絡語料庫上進行訓練的,這些語料庫不可避免地包含來自可靠性不同的來源的矛盾事實信息。本文提出了一種衡量LLM屬性的方法,稱為可信來源對齊(TSA):即模型在面對不確定性或爭議時與可信出版商製作的內容保持一致的傾向。我們提出了FactCheckQA,這是一個基於事實核查文章語料庫的TSA評估數據集。我們描述了一個簡單的評估TSA的協議,並提供了對設計考量的詳細分析,包括回應提取、主張情境化和提示形式中的偏見。將該協議應用於PaLM-2後,我們發現隨著模型大小的擴大,模型在FactCheckQA上的表現從接近隨機到最高可達80%的平衡準確度,與可信來源對齊。
English
Large language models (LLMs) are trained on web-scale corpora that inevitably
include contradictory factual information from sources of varying reliability.
In this paper, we propose measuring an LLM property called trusted source
alignment (TSA): the model's propensity to align with content produced by
trusted publishers in the face of uncertainty or controversy. We present
FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking
articles. We describe a simple protocol for evaluating TSA and offer a detailed
analysis of design considerations including response extraction, claim
contextualization, and bias in prompt formulation. Applying the protocol to
PaLM-2, we find that as we scale up the model size, the model performance on
FactCheckQA improves from near-random to up to 80% balanced accuracy in
aligning with trusted sources.