衡量語言模型中主觀全球意見表徵的方法

摘要

大型語言模型（LLMs）可能無法公平地代表多元化的全球社會議題觀點。本文中，我們開發了一個量化框架來評估模型生成的回應與誰的意見更相似。我們首先建立了一個數據集GlobalOpinionQA，其中包含了來自跨國調查的問題和答案，旨在捕捉不同國家對全球議題的多元意見。接著，我們定義了一個指標，量化了LLM生成的調查回應與人類回應之間的相似度，並以國家為條件。利用我們的框架，我們對一個訓練為有益、誠實和無害的LLM進行了三個實驗，並使用Constitutional AI。默認情況下，LLM的回應往往更類似於某些人口的意見，例如來自美國、歐洲和南美洲的人口，突顯了偏見的潛在性。當我們提示模型考慮特定國家的觀點時，回應會轉變為更接近被提示人口的意見，但可能反映出有害的文化刻板印象。當我們將GlobalOpinionQA問題翻譯成目標語言時，模型的回應不一定會變得最接近該語言使用者的意見。我們釋出我們的數據集供他人使用和擴展。我們的數據位於https://huggingface.co/datasets/Anthropic/llm_global_opinions。我們還提供了一個互動式可視化工具，位於https://llmglobalvalues.anthropic.com。

English

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.

衡量語言模型中主觀全球意見表徵的方法

Towards Measuring the Representation of Subjective Global Opinions in Language Models

摘要

Support