言語モデルにおける主観的グローバル意見の表現を測定するために

要旨

大規模言語モデル（LLM）は、社会問題に関する多様なグローバルな視点を公平に反映していない可能性があります。本論文では、モデルが生成する回答が誰の意見に近いかを評価するための定量的なフレームワークを開発します。まず、異なる国々におけるグローバルな問題に関する多様な意見を捉えるために設計された国際調査の質問と回答から成るデータセット、GlobalOpinionQAを構築します。次に、国を条件として、LLMが生成する調査回答と人間の回答との類似性を定量化する指標を定義します。このフレームワークを用いて、Constitutional AIを用いて役立ち、正直、無害であるように訓練されたLLMに対して3つの実験を実施します。デフォルトでは、LLMの回答は、米国や一部の欧州および南米の国々など、特定の人口集団の意見に類似する傾向があり、バイアスの可能性が浮き彫りになります。モデルに特定の国の視点を考慮するよう促すと、回答はその人口集団の意見に近づくものの、有害な文化的ステレオタイプを反映する場合があります。GlobalOpinionQAの質問を対象言語に翻訳しても、モデルの回答が必ずしもその言語を話す人々の意見に最も近くなるわけではありません。私たちは、他の研究者が利用し、発展させられるようデータセットを公開します。データはhttps://huggingface.co/datasets/Anthropic/llm_global_opinionsにあります。また、インタラクティブな可視化ツールをhttps://llmglobalvalues.anthropic.comで提供しています。

English

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.

言語モデルにおける主観的グローバル意見の表現を測定するために

Towards Measuring the Representation of Subjective Global Opinions in Language Models

要旨

Support