大型语言模型中的可信源对齐

摘要

大型语言模型（LLMs）是在覆盖规模庞大的语料库上训练的，这些语料库不可避免地包含了来自可靠性不同的来源的矛盾事实信息。本文提出了衡量LLM属性的概念，即受信任来源对齐（TSA）：模型在面对不确定性或争议时与可信出版商制作的内容保持一致的倾向。我们提出了FactCheckQA，这是一个基于事实核查文章语料库的TSA评估数据集。我们描述了一个用于评估TSA的简单协议，并提供了对设计考虑因素的详细分析，包括响应提取、主张情境化和提示公式中的偏见。将该协议应用于PaLM-2后，我们发现随着模型规模的扩大，模型在FactCheckQA上的表现从接近随机到最高可达80%的平衡准确率，即与受信任来源对齐。

English

Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment (TSA): the model's propensity to align with content produced by trusted publishers in the face of uncertainty or controversy. We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles. We describe a simple protocol for evaluating TSA and offer a detailed analysis of design considerations including response extraction, claim contextualization, and bias in prompt formulation. Applying the protocol to PaLM-2, we find that as we scale up the model size, the model performance on FactCheckQA improves from near-random to up to 80% balanced accuracy in aligning with trusted sources.

大型语言模型中的可信源对齐

Trusted Source Alignment in Large Language Models

摘要

Support