羊駝知道 GPT 不顯示的事情：信心估計的替代模型

摘要

為了維護使用者的信任，大型語言模型（LLMs）應在答錯的情況下表達低信心，而不是誤導使用者。目前估計信心的標準方法是使用這些模型的軟最大化機率，但截至2023年11月，像是GPT-4和Claude-v1.3等最先進的LLMs並未提供這些機率的訪問。我們首先研究了語言上引出信心的方法--詢問LLM對其答案的信心--在12個問答數據集上表現合理（在GPT-4上平均為80.5％的AUC--比隨機基準高7％），但仍有改進的空間。然後，我們探索使用替代信心模型--使用我們有機率的模型來評估原模型對於特定問題的信心。令人驚訝的是，即使這些機率來自不同且通常較弱的模型，這種方法在12個數據集中有9個數據集的AUC高於語言信心。我們最佳的方法是結合語言信心和替代模型機率，可以在所有12個數據集上提供最先進的信心估計（在GPT-4上平均AUC為84.6％）。

English

To maintain user trust, large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user. The standard approach of estimating confidence is to use the softmax probabilities of these models, but as of November 2023, state-of-the-art LLMs such as GPT-4 and Claude-v1.3 do not provide access to these probabilities. We first study eliciting confidence linguistically -- asking an LLM for its confidence in its answer -- which performs reasonably (80.5% AUC on GPT-4 averaged across 12 question-answering datasets -- 7% above a random baseline) but leaves room for improvement. We then explore using a surrogate confidence model -- using a model where we do have probabilities to evaluate the original model's confidence in a given question. Surprisingly, even though these probabilities come from a different and often weaker model, this method leads to higher AUC than linguistic confidences on 9 out of 12 datasets. Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets (84.6% average AUC on GPT-4).

羊駝知道 GPT 不顯示的事情：信心估計的替代模型

Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation

摘要

Support