대규모 언어 모델의 저자 프로파일링을 통한 문화적 신호 탐색

초록

대규모 언어 모델(LLM)이 사회적 영향을 미치는 애플리케이션에 점점 더 많이 배포됨에 따라, 이들이 내재하는 문화적 편향에 대한 우려가 제기되고 있다. 본 연구는 LLM이 제로샷 설정에서 곡 가사를 통해 가수의 성별과 민족성을 작업별 미세 조정 없이 추론하는 저자 프로파일링 작업을 수행할 수 있는지 평가하여 이러한 표현을 탐구한다. 10,000개 이상의 가사에 대해 여러 오픈소스 모델을 평가한 결과, LLM은 상당한 프로파일링 성능을 달성하지만 체계적인 문화적 정렬을 보인다는 것을 발견했다. 대부분의 모델은 북미 민족성을 기본값으로 하는 반면, DeepSeek-1.5B는 아시아 민족성과 더 강하게 정렬되었다. 이러한 결과는 모델의 예측 분포와 생성된 근거에 대한 분석을 통해 도출되었다. 이러한 편차를 정량화하기 위해 두 가지 공정성 지표인 양식 정확도 분기(MAD)와 재현율 분기(RD)를 도입하였으며, Ministral-8B가 평가된 모델 중 가장 강한 민족성 편향을 보인 반면 Gemma-12B가 가장 균형 잡힌 행동을 보임을 확인했다. 우리의 코드는 GitHub(https://github.com/ValentinLafargue/CulturalProbingLLM)에서 확인할 수 있다.

English

Large language models (LLMs) are increasingly deployed in applications with societal impact, raising concerns about the cultural biases they encode. We probe these representations by evaluating whether LLMs can perform author profiling from song lyrics in a zero-shot setting, inferring singers' gender and ethnicity without task-specific fine-tuning. Across several open-source models evaluated on more than 10,000 lyrics, we find that LLMs achieve non-trivial profiling performance but demonstrate systematic cultural alignment: most models default toward North American ethnicity, while DeepSeek-1.5B aligns more strongly with Asian ethnicity. This finding emerges from both the models' prediction distributions and an analysis of their generated rationales. To quantify these disparities, we introduce two fairness metrics, Modality Accuracy Divergence (MAD) and Recall Divergence (RD), and show that Ministral-8B displays the strongest ethnicity bias among the evaluated models, whereas Gemma-12B shows the most balanced behavior. Our code is available on GitHub (https://github.com/ValentinLafargue/CulturalProbingLLM).

대규모 언어 모델의 저자 프로파일링을 통한 문화적 신호 탐색

Probing Cultural Signals in Large Language Models through Author Profiling

초록

Support