ラマはGPTが示さないことを知っている：信頼度推定のための代理モデル

要旨

ユーザーの信頼を維持するためには、大規模言語モデル（LLM）は、誤った例に対して低い信頼度を示し、ユーザーを誤解させないようにすべきです。信頼度を推定する標準的なアプローチは、これらのモデルのソフトマックス確率を使用することですが、2023年11月現在、GPT-4やClaude-v1.3などの最先端のLLMはこれらの確率へのアクセスを提供していません。まず、LLMにその回答に対する信頼度を言語的に尋ねる方法を検討しました。この方法は、12の質問応答データセットにわたるGPT-4の平均AUCが80.5%（ランダムベースラインより7%高い）と比較的良好な結果を示しましたが、改善の余地があります。次に、代理信頼度モデルを使用する方法を探りました。これは、確率が利用可能な別のモデルを使用して、元のモデルの特定の質問に対する信頼度を評価するものです。驚くべきことに、これらの確率が異なる、しばしば弱いモデルから得られるにもかかわらず、この方法は12のデータセットのうち9つで言語的信頼度よりも高いAUCをもたらしました。言語的信頼度と代理モデルの確率を組み合わせた最良の方法は、12のデータセットすべてで最先端の信頼度推定を提供し、GPT-4の平均AUCは84.6%でした。

English

To maintain user trust, large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user. The standard approach of estimating confidence is to use the softmax probabilities of these models, but as of November 2023, state-of-the-art LLMs such as GPT-4 and Claude-v1.3 do not provide access to these probabilities. We first study eliciting confidence linguistically -- asking an LLM for its confidence in its answer -- which performs reasonably (80.5% AUC on GPT-4 averaged across 12 question-answering datasets -- 7% above a random baseline) but leaves room for improvement. We then explore using a surrogate confidence model -- using a model where we do have probabilities to evaluate the original model's confidence in a given question. Surprisingly, even though these probabilities come from a different and often weaker model, this method leads to higher AUC than linguistic confidences on 9 out of 12 datasets. Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets (84.6% average AUC on GPT-4).

ラマはGPTが示さないことを知っている：信頼度推定のための代理モデル

Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation

要旨

Support