라마는 GPT가 보여주지 못하는 것을 안다: 신뢰도 추정을 위한 대리 모델

초록

사용자 신뢰를 유지하기 위해 대규모 언어 모델(LLM)은 잘못된 예시에서 낮은 신뢰도를 표시하여 사용자를 오도하지 않아야 한다. 이러한 모델의 신뢰도를 추정하는 표준적인 접근 방식은 소프트맥스 확률을 사용하는 것이지만, 2023년 11월 기준으로 GPT-4 및 Claude-v1.3과 같은 최신 LLM은 이러한 확률에 대한 접근을 제공하지 않는다. 우리는 먼저 언어적으로 신뢰도를 유도하는 방법을 연구했다. 즉, LLM에게 답변에 대한 자신의 신뢰도를 묻는 방식으로, 이는 합리적인 성능(12개의 질문-답변 데이터셋에 대해 GPT-4에서 평균 80.5% AUC, 무작위 기준선보다 7% 높음)을 보였지만 개선의 여지가 있었다. 그런 다음 대리 신뢰도 모델을 사용하여 접근 방식을 탐구했다. 이는 확률을 가지고 있는 모델을 사용하여 원래 모델의 특정 질문에 대한 신뢰도를 평가하는 방법이다. 놀랍게도, 이러한 확률이 다른 종종 더 약한 모델에서 나오더라도, 이 방법은 12개 데이터셋 중 9개에서 언어적 신뢰도보다 더 높은 AUC를 보였다. 언어적 신뢰도와 대리 모델 확률을 결합한 우리의 최적의 방법은 12개 데이터셋 모두에서 최신의 신뢰도 추정치를 제공한다(GPT-4에서 평균 84.6% AUC).

English

To maintain user trust, large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user. The standard approach of estimating confidence is to use the softmax probabilities of these models, but as of November 2023, state-of-the-art LLMs such as GPT-4 and Claude-v1.3 do not provide access to these probabilities. We first study eliciting confidence linguistically -- asking an LLM for its confidence in its answer -- which performs reasonably (80.5% AUC on GPT-4 averaged across 12 question-answering datasets -- 7% above a random baseline) but leaves room for improvement. We then explore using a surrogate confidence model -- using a model where we do have probabilities to evaluate the original model's confidence in a given question. Surprisingly, even though these probabilities come from a different and often weaker model, this method leads to higher AUC than linguistic confidences on 9 out of 12 datasets. Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets (84.6% average AUC on GPT-4).

라마는 GPT가 보여주지 못하는 것을 안다: 신뢰도 추정을 위한 대리 모델

Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation

초록

Support