語言模型偏好所熟悉的事物:透過信心偏好進行相對信心估計
Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences
February 3, 2025
作者: Vaishnavi Shrivastava, Ananya Kumar, Percy Liang
cs.AI
摘要
語言模型(LMs)應提供可靠的信心估計,以幫助使用者在偵測輸出中的錯誤並在必要時諮詢人類專家。要求語言模型評估其信心("從0到1評分您的信心。")是評估其不確定性的自然方式。然而,模型難以提供對信心的絕對評估(即獨立於其他問題回答問題時的信心評判),並且它們生成的粗粒度分數對於評估其答案的正確性並不有用。我們提出相對信心估計,其中我們將問題相互比較,要求模型對信心進行相對判斷("您在回答正確哪個問題上更有信心?")。將每個問題視為一個"選手"在與其他問題的一系列比賽中進行比較,並將模型的偏好視為比賽結果,我們可以使用Elo評分和Bradley-Terry等排名聚合方法將模型的信心偏好轉換為信心分數。我們在五個最先進的LM(GPT-4、GPT-4o、Gemini 1.5 Pro、Claude 3.5 Sonnet和Llama 3.1 405B)上對相對信心估計進行評估,涵蓋14個具有挑戰性的STEM、社會科學和常識推理問答任務。我們的結果表明,相對信心估計一致地提供比絕對信心估計更可靠的信心分數,對於直接絕對信心估計方法的選擇性分類AUC平均增益為3.5%,對於所有模型和數據集,相對於自一致性方法的增益為1.7%。
English
Language models (LMs) should provide reliable confidence estimates to help
users detect mistakes in their outputs and defer to human experts when
necessary. Asking a language model to assess its confidence ("Score your
confidence from 0-1.") is a natural way of evaluating its uncertainty. However,
models struggle to provide absolute assessments of confidence (i.e. judging
confidence in answering a question independent of other questions) and the
coarse-grained scores they produce are not useful for evaluating the
correctness of their answers. We propose relative confidence estimation, where
we match up questions against each other and ask the model to make relative
judgments of confidence ("Which question are you more confident in answering
correctly?"). Treating each question as a "player" in a series of matchups
against other questions and the model's preferences as match outcomes, we can
use rank aggregation methods like Elo rating and Bradley-Terry to translate the
model's confidence preferences into confidence scores. We evaluate relative
confidence estimation against absolute confidence estimation and
self-consistency confidence methods on five state-of-the-art LMs -- GPT-4,
GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, and Llama 3.1 405B -- across 14
challenging STEM, social science, and commonsense reasoning question answering
tasks. Our results demonstrate that relative confidence estimation consistently
provides more reliable confidence scores than absolute confidence estimation,
with average gains of 3.5% in selective classification AUC over direct absolute
confidence estimation methods and 1.7% over self-consistency approaches across
all models and datasets.Summary
AI-Generated Summary