大型語言模型對德語方言使用者存在歧視

摘要

方言作為人類文化的重要組成部分，遍佈於世界各地。在德國，超過40%的人口使用地區方言（Adler和Hansen，2022年）。然而，儘管方言具有文化重要性，使用方言的個體往往面臨負面的社會刻板印象。我們探討此類刻板印象是否也反映在大型語言模型（LLMs）中。我們借鑑社會語言學中關於方言感知的文獻，分析通常與方言使用者相關的特質。基於這些特質，我們通過兩項任務——關聯任務和決策任務——評估LLMs表現出的方言命名偏見和方言使用偏見。為評估模型的方言使用偏見，我們構建了一個新穎的評估語料庫，該語料庫將七種德國地區方言（如阿勒曼尼語和巴伐利亞語）的句子與其標準德語對應句配對。我們發現：（1）在關聯任務中，所有評估的LLMs均對德國方言使用者表現出顯著的方言命名和方言使用偏見，這體現在負面形容詞的關聯上；（2）所有模型在決策過程中都複製了這些方言命名和方言使用偏見；（3）與先前研究表明在明確提及人口統計特徵時偏見最小不同，我們發現明確標記語言人口統計特徵——德國方言使用者——比方言使用等隱含線索更能放大偏見。

English

Dialects represent a significant component of human culture and are found across all regions of the world. In Germany, more than 40% of the population speaks a regional dialect (Adler and Hansen, 2022). However, despite cultural importance, individuals speaking dialects often face negative societal stereotypes. We examine whether such stereotypes are mirrored by large language models (LLMs). We draw on the sociolinguistic literature on dialect perception to analyze traits commonly associated with dialect speakers. Based on these traits, we assess the dialect naming bias and dialect usage bias expressed by LLMs in two tasks: an association task and a decision task. To assess a model's dialect usage bias, we construct a novel evaluation corpus that pairs sentences from seven regional German dialects (e.g., Alemannic and Bavarian) with their standard German counterparts. We find that: (1) in the association task, all evaluated LLMs exhibit significant dialect naming and dialect usage bias against German dialect speakers, reflected in negative adjective associations; (2) all models reproduce these dialect naming and dialect usage biases in their decision making; and (3) contrary to prior work showing minimal bias with explicit demographic mentions, we find that explicitly labeling linguistic demographics--German dialect speakers--amplifies bias more than implicit cues like dialect usage.

大型語言模型對德語方言使用者存在歧視

Large Language Models Discriminate Against Speakers of German Dialects

摘要

Support