羊驼知道 GPT 不展示的内容:置信度估计的替代模型
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
November 15, 2023
作者: Vaishnavi Shrivastava, Percy Liang, Ananya Kumar
cs.AI
摘要
为了维护用户信任,大型语言模型(LLMs)应在错误时表现出低置信度,而不是误导用户。估计置信度的标准方法是使用这些模型的softmax概率,但截至2023年11月,诸如GPT-4和Claude-v1.3等最先进的LLMs并未提供访问这些概率的途径。我们首先研究了在语言上引导置信度的方法——询问LLM对其答案的置信度——在12个问答数据集上表现合理(在GPT-4上平均为80.5%的AUC,比随机基线高出7%),但仍有改进的空间。然后,我们探讨了使用替代置信度模型的方法——使用我们拥有概率的模型来评估原始模型对给定问题的置信度。令人惊讶的是,即使这些概率来自不同且通常较弱的模型,这种方法在12个数据集中有9个的AUC高于语言置信度。我们最佳的方法是将语言置信度和替代模型概率结合起来,在所有12个数据集上提供了最先进的置信度估计(在GPT-4上平均为84.6%的AUC)。
English
To maintain user trust, large language models (LLMs) should signal low
confidence on examples where they are incorrect, instead of misleading the
user. The standard approach of estimating confidence is to use the softmax
probabilities of these models, but as of November 2023, state-of-the-art LLMs
such as GPT-4 and Claude-v1.3 do not provide access to these probabilities. We
first study eliciting confidence linguistically -- asking an LLM for its
confidence in its answer -- which performs reasonably (80.5% AUC on GPT-4
averaged across 12 question-answering datasets -- 7% above a random baseline)
but leaves room for improvement. We then explore using a surrogate confidence
model -- using a model where we do have probabilities to evaluate the original
model's confidence in a given question. Surprisingly, even though these
probabilities come from a different and often weaker model, this method leads
to higher AUC than linguistic confidences on 9 out of 12 datasets. Our best
method composing linguistic confidences and surrogate model probabilities gives
state-of-the-art confidence estimates on all 12 datasets (84.6% average AUC on
GPT-4).