言語モデルは、彼らが知っているものを好む：信頼度の相対推定における信頼度選好

要旨

言語モデル（LMs）は、ユーザーが出力の誤りを検出し、必要に応じて人間の専門家に譲るのを支援するために、信頼性の高い信頼度推定を提供すべきです。言語モデルに自信を評価するよう求める（「0から1までの信頼度を評価してください」という）ことは、その不確実性を評価する自然な方法です。しかし、モデルは信頼度の絶対的な評価（つまり、他の質問とは独立して質問に答える際の信頼度を判断すること）を提供するのに苦労し、彼らが生成する粗いスコアは、彼らの回答の正確性を評価するのに役立ちません。私たちは、相対的な信頼度推定を提案します。ここでは、質問同士を対戦させ、モデルに相対的な信頼度の判断を求めます（「どちらの質問について正しく回答する自信がありますか？」）。各質問を他の質問と対戦させ、モデルの選好を対戦結果として扱うことで、EloレーティングやBradley-Terryなどのランク集約手法を使用して、モデルの信頼度の選好を信頼度スコアに変換できます。私たちは、14の難解なSTEM、社会科学、および常識的な推論問題に対する5つの最先端LM（GPT-4、GPT-4o、Gemini 1.5 Pro、Claude 3.5 Sonnet、およびLlama 3.1 405B）で、相対的な信頼度推定を絶対的な信頼度推定および自己整合信頼度手法と比較評価します。結果は、相対的な信頼度推定が常に絶対的な信頼度推定よりも信頼性の高い信頼度スコアを提供し、直接の絶対的な信頼度推定手法に対して選択的分類AUCで平均3.5%、自己整合手法に対しては全モデルとデータセット全体で平均1.7%の利益をもたらすことを示しています。

English

Language models (LMs) should provide reliable confidence estimates to help users detect mistakes in their outputs and defer to human experts when necessary. Asking a language model to assess its confidence ("Score your confidence from 0-1.") is a natural way of evaluating its uncertainty. However, models struggle to provide absolute assessments of confidence (i.e. judging confidence in answering a question independent of other questions) and the coarse-grained scores they produce are not useful for evaluating the correctness of their answers. We propose relative confidence estimation, where we match up questions against each other and ask the model to make relative judgments of confidence ("Which question are you more confident in answering correctly?"). Treating each question as a "player" in a series of matchups against other questions and the model's preferences as match outcomes, we can use rank aggregation methods like Elo rating and Bradley-Terry to translate the model's confidence preferences into confidence scores. We evaluate relative confidence estimation against absolute confidence estimation and self-consistency confidence methods on five state-of-the-art LMs -- GPT-4, GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, and Llama 3.1 405B -- across 14 challenging STEM, social science, and commonsense reasoning question answering tasks. Our results demonstrate that relative confidence estimation consistently provides more reliable confidence scores than absolute confidence estimation, with average gains of 3.5% in selective classification AUC over direct absolute confidence estimation methods and 1.7% over self-consistency approaches across all models and datasets.

言語モデルは、彼らが知っているものを好む：信頼度の相対推定における信頼度選好

Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences

要旨

Support