大規模言語モデルはドイツ語方言話者に対して差別的である

要旨

方言は人間文化の重要な要素であり、世界中のあらゆる地域で見られます。ドイツでは、人口の40%以上が地域方言を話します（Adler and Hansen, 2022）。しかし、文化的な重要性にもかかわらず、方言を話す個人はしばしば否定的な社会的ステレオタイプに直面します。本研究では、このようなステレオタイプが大規模言語モデル（LLM）に反映されているかどうかを検証します。方言認識に関する社会言語学の文献を参照し、方言話者に一般的に関連付けられる特性を分析します。これらの特性に基づいて、LLMが示す方言命名バイアスと方言使用バイアスを、連想タスクと意思決定タスクの2つの課題を通じて評価します。モデルの方言使用バイアスを評価するために、7つのドイツ地域方言（例：アレマン語やバイエルン語）と標準ドイツ語の対応文をペアにした新しい評価コーパスを構築します。その結果、(1) 連想タスクにおいて、評価されたすべてのLLMはドイツ方言話者に対する有意な方言命名バイアスと方言使用バイアスを示し、否定的な形容詞の連想に反映されていること、(2) すべてのモデルが意思決定においてこれらの方言命名バイアスと方言使用バイアスを再現していること、(3) 明示的な人口統計的言及ではバイアスが最小限であることを示した先行研究とは異なり、言語的人口統計（ドイツ方言話者）を明示的にラベル付けすることが、方言使用のような暗黙の手がかりよりもバイアスを増幅させることを発見しました。

English

Dialects represent a significant component of human culture and are found across all regions of the world. In Germany, more than 40% of the population speaks a regional dialect (Adler and Hansen, 2022). However, despite cultural importance, individuals speaking dialects often face negative societal stereotypes. We examine whether such stereotypes are mirrored by large language models (LLMs). We draw on the sociolinguistic literature on dialect perception to analyze traits commonly associated with dialect speakers. Based on these traits, we assess the dialect naming bias and dialect usage bias expressed by LLMs in two tasks: an association task and a decision task. To assess a model's dialect usage bias, we construct a novel evaluation corpus that pairs sentences from seven regional German dialects (e.g., Alemannic and Bavarian) with their standard German counterparts. We find that: (1) in the association task, all evaluated LLMs exhibit significant dialect naming and dialect usage bias against German dialect speakers, reflected in negative adjective associations; (2) all models reproduce these dialect naming and dialect usage biases in their decision making; and (3) contrary to prior work showing minimal bias with explicit demographic mentions, we find that explicitly labeling linguistic demographics--German dialect speakers--amplifies bias more than implicit cues like dialect usage.

大規模言語モデルはドイツ語方言話者に対して差別的である

Large Language Models Discriminate Against Speakers of German Dialects

要旨

Support