LLM 기반 평가자 보정

초록

최근 대형 언어 모델(LLM)의 언어 모델링 및 창발적 능력에 대한 발전은 이를 자연어 생성 품질의 참조 없는 평가자로서, 그리고 인간 평가의 유능한 대안으로서 유망하게 만들고 있다. 그러나 폐쇄 소스 또는 호스팅 및 튜닝에 필요한 높은 계산 요구로 인해, 기성 LLM 기반 평가자를 더 나은 인간 정렬을 위해 추가로 보정하는 실천이 부족한 상황이다. 본 연구에서는 인간 선호도에 맞춰 LLM 기반 평가자를 자동으로 보정하고 정렬하기 위한 다단계, 경사 없는 접근 방식인 AutoCalibrate를 제안한다. 인간 선호도를 명시적으로 모델링하는 대신, 우리는 먼저 이를 인간 라벨 집합 내에 암묵적으로 포함시킨다. 그런 다음, 언어 모델 자체가 다양한 소수 샷 예제에 대한 컨텍스트 내 학습을 활용하여 초기 점수 기준 집합을 작성한다. 이 기준 집합을 더욱 보정하기 위해, 최고 성능을 보이는 기준을 선택하고 자기 정제를 통해 재작성한다. 여러 텍스트 품질 평가 데이터셋에 대한 실험을 통해 보정을 통해 전문가 평가와의 상관 관계가 크게 개선됨을 보여준다. 우리의 포괄적인 질적 분석은 효과적인 점수 기준의 본질에 대한 통찰력 있는 직관과 관찰을 전달한다.

English

Recent advancements in large language models (LLMs) on language modeling and emergent capabilities make them a promising reference-free evaluator of natural language generation quality, and a competent alternative to human evaluation. However, hindered by the closed-source or high computational demand to host and tune, there is a lack of practice to further calibrate an off-the-shelf LLM-based evaluator towards better human alignment. In this work, we propose AutoCalibrate, a multi-stage, gradient-free approach to automatically calibrate and align an LLM-based evaluator toward human preference. Instead of explicitly modeling human preferences, we first implicitly encompass them within a set of human labels. Then, an initial set of scoring criteria is drafted by the language model itself, leveraging in-context learning on different few-shot examples. To further calibrate this set of criteria, we select the best performers and re-draft them with self-refinement. Our experiments on multiple text quality evaluation datasets illustrate a significant improvement in correlation with expert evaluation through calibration. Our comprehensive qualitative analysis conveys insightful intuitions and observations on the essence of effective scoring criteria.

LLM 기반 평가자 보정

Calibrating LLM-Based Evaluator

초록

Support