대형 언어 모델에서의 신뢰할 수 있는 소스 정렬

초록

대규모 언어 모델(LLMs)은 다양한 신뢰도의 출처에서 나온 상반된 사실 정보를 필연적으로 포함하는 웹 규모의 코퍼스로 학습됩니다. 본 논문에서는 신뢰할 수 있는 출처 정렬(Trusted Source Alignment, TSA)이라는 LLM 속성을 측정하는 방법을 제안합니다. 이는 불확실성이나 논란이 있을 때 신뢰할 수 있는 출처에서 생산된 콘텐츠와 일치하는 모델의 성향을 의미합니다. 우리는 팩트 체크 기사 코퍼스를 기반으로 한 TSA 평가 데이터셋인 FactCheckQA를 소개합니다. 또한 TSA 평가를 위한 간단한 프로토콜을 설명하고, 응답 추출, 주장의 맥락화, 프롬프트 구성의 편향 등 설계 고려 사항에 대한 상세한 분석을 제공합니다. 이 프로토콜을 PaLM-2에 적용한 결과, 모델 크기를 키울수록 FactCheckQA에서의 성능이 무작위 수준에서 최대 80%의 균형 정확도로 향상되며, 신뢰할 수 있는 출처와의 정렬이 개선되는 것을 확인했습니다.

English

Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment (TSA): the model's propensity to align with content produced by trusted publishers in the face of uncertainty or controversy. We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles. We describe a simple protocol for evaluating TSA and offer a detailed analysis of design considerations including response extraction, claim contextualization, and bias in prompt formulation. Applying the protocol to PaLM-2, we find that as we scale up the model size, the model performance on FactCheckQA improves from near-random to up to 80% balanced accuracy in aligning with trusted sources.

대형 언어 모델에서의 신뢰할 수 있는 소스 정렬

Trusted Source Alignment in Large Language Models

초록

Support