대형 언어 모델은 독일 방언 사용자에 대해 차별을 보인다

초록

방언은 인간 문화의 중요한 구성 요소로, 전 세계 모든 지역에서 발견됩니다. 독일에서는 인구의 40% 이상이 지역 방언을 사용합니다(Adler와 Hansen, 2022). 그러나 문화적 중요성에도 불구하고, 방언을 사용하는 개인들은 종종 부정적인 사회적 편견에 직면합니다. 우리는 이러한 편견이 대형 언어 모델(LLMs)에 반영되는지 여부를 조사합니다. 우리는 방언 사용자와 관련된 일반적인 특성을 분석하기 위해 사회언어학적 문헌을 참고합니다. 이러한 특성을 바탕으로, 우리는 두 가지 과제(연관 과제와 결정 과제)에서 LLMs가 나타내는 방언 명명 편향과 방언 사용 편향을 평가합니다. 모델의 방언 사용 편향을 평가하기 위해, 우리는 알레만어와 바이에른어 등 7개 독일 지역 방언의 문장을 표준 독일어 문장과 짝지은 새로운 평가 코퍼스를 구축합니다. 우리는 다음과 같은 결과를 발견했습니다: (1) 연관 과제에서 평가된 모든 LLMs는 독일 방언 사용자에 대한 부정적인 형용사 연관을 통해 방언 명명 및 방언 사용 편향을 크게 나타냈습니다; (2) 모든 모델은 결정 과정에서 이러한 방언 명명 및 방언 사용 편향을 재현했습니다; 그리고 (3) 이전 연구에서 명시적인 인구통계학적 언급이 편견을 최소화한다는 결과와 달리, 우리는 언어적 인구통계학적 요소(독일 방언 사용자)를 명시적으로 표시하는 것이 방언 사용과 같은 암시적 단서보다 편견을 더욱 증폭시킨다는 것을 발견했습니다.

English

Dialects represent a significant component of human culture and are found across all regions of the world. In Germany, more than 40% of the population speaks a regional dialect (Adler and Hansen, 2022). However, despite cultural importance, individuals speaking dialects often face negative societal stereotypes. We examine whether such stereotypes are mirrored by large language models (LLMs). We draw on the sociolinguistic literature on dialect perception to analyze traits commonly associated with dialect speakers. Based on these traits, we assess the dialect naming bias and dialect usage bias expressed by LLMs in two tasks: an association task and a decision task. To assess a model's dialect usage bias, we construct a novel evaluation corpus that pairs sentences from seven regional German dialects (e.g., Alemannic and Bavarian) with their standard German counterparts. We find that: (1) in the association task, all evaluated LLMs exhibit significant dialect naming and dialect usage bias against German dialect speakers, reflected in negative adjective associations; (2) all models reproduce these dialect naming and dialect usage biases in their decision making; and (3) contrary to prior work showing minimal bias with explicit demographic mentions, we find that explicitly labeling linguistic demographics--German dialect speakers--amplifies bias more than implicit cues like dialect usage.

대형 언어 모델은 독일 방언 사용자에 대해 차별을 보인다

Large Language Models Discriminate Against Speakers of German Dialects

초록

Support