羊驼知道 GPT 不展示的内容：置信度估计的替代模型

摘要

为了维护用户信任，大型语言模型（LLMs）应在错误时表现出低置信度，而不是误导用户。估计置信度的标准方法是使用这些模型的softmax概率，但截至2023年11月，诸如GPT-4和Claude-v1.3等最先进的LLMs并未提供访问这些概率的途径。我们首先研究了在语言上引导置信度的方法——询问LLM对其答案的置信度——在12个问答数据集上表现合理（在GPT-4上平均为80.5%的AUC，比随机基线高出7%），但仍有改进的空间。然后，我们探讨了使用替代置信度模型的方法——使用我们拥有概率的模型来评估原始模型对给定问题的置信度。令人惊讶的是，即使这些概率来自不同且通常较弱的模型，这种方法在12个数据集中有9个的AUC高于语言置信度。我们最佳的方法是将语言置信度和替代模型概率结合起来，在所有12个数据集上提供了最先进的置信度估计（在GPT-4上平均为84.6%的AUC）。

English

To maintain user trust, large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user. The standard approach of estimating confidence is to use the softmax probabilities of these models, but as of November 2023, state-of-the-art LLMs such as GPT-4 and Claude-v1.3 do not provide access to these probabilities. We first study eliciting confidence linguistically -- asking an LLM for its confidence in its answer -- which performs reasonably (80.5% AUC on GPT-4 averaged across 12 question-answering datasets -- 7% above a random baseline) but leaves room for improvement. We then explore using a surrogate confidence model -- using a model where we do have probabilities to evaluate the original model's confidence in a given question. Surprisingly, even though these probabilities come from a different and often weaker model, this method leads to higher AUC than linguistic confidences on 9 out of 12 datasets. Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets (84.6% average AUC on GPT-4).

羊驼知道 GPT 不展示的内容：置信度估计的替代模型

Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation

摘要

Support