주관적 글로벌 의견의 언어 모델 내 표현 측정을 향하여

초록

대형 언어 모델(LLMs)은 사회적 문제에 대한 다양한 글로벌 관점을 공평하게 반영하지 못할 수 있다. 본 논문에서는 모델 생성 응답이 누구의 의견과 더 유사한지를 평가하기 위한 정량적 프레임워크를 개발한다. 먼저, 다양한 국가 간의 글로벌 이슈에 대한 다양한 의견을 포착하기 위해 설계된 국제 설문조사의 질문과 답변으로 구성된 데이터셋인 GlobalOpinionQA를 구축한다. 다음으로, 국가를 조건으로 하여 LLM 생성 설문 응답과 인간 응답 간의 유사성을 정량화하는 지표를 정의한다. 이 프레임워크를 사용하여, 헌법적 AI(Constitutional AI)를 통해 도움적이고, 정직하며, 무해하도록 훈련된 LLM에 대해 세 가지 실험을 실행한다. 기본적으로 LLM 응답은 미국 및 일부 유럽 및 남미 국가와 같은 특정 인구의 의견과 더 유사한 경향이 있어 편향 가능성을 강조한다. 모델에 특정 국가의 관점을 고려하도록 프롬프트를 제공하면, 응답이 프롬프트된 인구의 의견과 더 유사하게 이동하지만, 유해한 문화적 고정관념을 반영할 수 있다. GlobalOpinionQA 질문을 대상 언어로 번역할 때, 모델의 응답이 반드시 해당 언어 사용자의 의견과 가장 유사해지지는 않는다. 우리는 다른 연구자들이 사용하고 발전시킬 수 있도록 데이터셋을 공개한다. 데이터는 https://huggingface.co/datasets/Anthropic/llm_global_opinions에서 확인할 수 있다. 또한, https://llmglobalvalues.anthropic.com에서 인터랙티브 시각화 자료를 제공한다.

English

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.

주관적 글로벌 의견의 언어 모델 내 표현 측정을 향하여

Towards Measuring the Representation of Subjective Global Opinions in Language Models

초록

Support