大型語言模型中的可信來源對齊

摘要

大型語言模型（LLMs）是在規模龐大的網絡語料庫上進行訓練的，這些語料庫不可避免地包含來自可靠性不同的來源的矛盾事實信息。本文提出了一種衡量LLM屬性的方法，稱為可信來源對齊（TSA）：即模型在面對不確定性或爭議時與可信出版商製作的內容保持一致的傾向。我們提出了FactCheckQA，這是一個基於事實核查文章語料庫的TSA評估數據集。我們描述了一個簡單的評估TSA的協議，並提供了對設計考量的詳細分析，包括回應提取、主張情境化和提示形式中的偏見。將該協議應用於PaLM-2後，我們發現隨著模型大小的擴大，模型在FactCheckQA上的表現從接近隨機到最高可達80％的平衡準確度，與可信來源對齊。

English

Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment (TSA): the model's propensity to align with content produced by trusted publishers in the face of uncertainty or controversy. We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles. We describe a simple protocol for evaluating TSA and offer a detailed analysis of design considerations including response extraction, claim contextualization, and bias in prompt formulation. Applying the protocol to PaLM-2, we find that as we scale up the model size, the model performance on FactCheckQA improves from near-random to up to 80% balanced accuracy in aligning with trusted sources.

大型語言模型中的可信來源對齊

Trusted Source Alignment in Large Language Models

摘要

Support