透過黑盒存取進行大型語言模型信心估計

摘要

在評估模型回應的信心時，估計不確定性或信心可能對於評估對回應的信任以及整個模型都具有重要意義。本文探討了如何估計大型語言模型（LLMs）的回應信心問題，僅透過黑盒或查詢訪問它們。我們提出了一個簡單且可擴展的框架，在這個框架中，我們設計了新穎的特徵並在這些特徵上訓練了一個（可解釋的）模型（即邏輯回歸），以估計信心。我們實證表明，我們的簡單框架在估計 flan-ul2、llama-13b 和 mistral-7b 的信心方面非常有效，並在一些情況下在基準數據集（如 TriviaQA、SQuAD、CoQA 和 Natural Questions）上比現有的黑盒信心估計方法表現出超過 10% 的優勢（在 AUROC 上）。此外，我們的可解釋方法提供了有關預測信心的特徵的洞察，這導致了一個有趣且有用的發現，即我們為一個 LLM 構建的信心模型在給定數據集上可以零-shot 泛化到其他模型。

English

Estimating uncertainty or confidence in the responses of a model can be significant in evaluating trust not only in the responses, but also in the model as a whole. In this paper, we explore the problem of estimating confidence for responses of large language models (LLMs) with simply black-box or query access to them. We propose a simple and extensible framework where, we engineer novel features and train a (interpretable) model (viz. logistic regression) on these features to estimate the confidence. We empirically demonstrate that our simple framework is effective in estimating confidence of flan-ul2, llama-13b and mistral-7b with it consistently outperforming existing black-box confidence estimation approaches on benchmark datasets such as TriviaQA, SQuAD, CoQA and Natural Questions by even over 10% (on AUROC) in some cases. Additionally, our interpretable approach provides insight into features that are predictive of confidence, leading to the interesting and useful discovery that our confidence models built for one LLM generalize zero-shot across others on a given dataset.

透過黑盒存取進行大型語言模型信心估計

Large Language Model Confidence Estimation via Black-Box Access

摘要

Support