大型语言模型对德语方言使用者存在歧视现象。

摘要

方言作为人类文化的重要组成部分，遍布世界各地。在德国，超过40%的人口使用地区方言（Adler和Hansen，2022）。然而，尽管方言具有文化价值，方言使用者却常遭遇负面的社会刻板印象。本研究探讨了此类刻板印象是否在大语言模型（LLMs）中有所体现。我们借鉴了社会语言学关于方言感知的研究，分析了与方言使用者普遍关联的特质。基于这些特质，我们通过两项任务——关联任务与决策任务，评估了LLMs所展现的方言命名偏见及方言使用偏见。为衡量模型的方言使用偏见，我们构建了一个新颖的评估语料库，其中包含七种德国地区方言（如阿勒曼尼语和巴伐利亚语）与标准德语句子的配对。研究发现：（1）在关联任务中，所有评估的LLMs均对德国方言使用者表现出显著的方言命名与使用偏见，体现为负面形容词的关联；（2）所有模型在其决策过程中均再现了这些方言命名与使用偏见；（3）与先前研究表明明确提及人口统计信息时偏见最小不同，我们发现明确标注语言人口统计信息——德国方言使用者——相较于方言使用等隐含线索，反而加剧了偏见。

English

Dialects represent a significant component of human culture and are found across all regions of the world. In Germany, more than 40% of the population speaks a regional dialect (Adler and Hansen, 2022). However, despite cultural importance, individuals speaking dialects often face negative societal stereotypes. We examine whether such stereotypes are mirrored by large language models (LLMs). We draw on the sociolinguistic literature on dialect perception to analyze traits commonly associated with dialect speakers. Based on these traits, we assess the dialect naming bias and dialect usage bias expressed by LLMs in two tasks: an association task and a decision task. To assess a model's dialect usage bias, we construct a novel evaluation corpus that pairs sentences from seven regional German dialects (e.g., Alemannic and Bavarian) with their standard German counterparts. We find that: (1) in the association task, all evaluated LLMs exhibit significant dialect naming and dialect usage bias against German dialect speakers, reflected in negative adjective associations; (2) all models reproduce these dialect naming and dialect usage biases in their decision making; and (3) contrary to prior work showing minimal bias with explicit demographic mentions, we find that explicitly labeling linguistic demographics--German dialect speakers--amplifies bias more than implicit cues like dialect usage.

大型语言模型对德语方言使用者存在歧视现象。

Large Language Models Discriminate Against Speakers of German Dialects

摘要

Support