Het Onderzoeken van Culturele Signalen in Grote Taalmodellen via Auteurprofilering

Samenvatting

Grote taalmodellen (LLM's) worden steeds vaker ingezet in toepassingen met maatschappelijke impact, wat zorgen oproept over de culturele vooroordelen die zij coderen. Wij onderzoeken deze representaties door te evalueren of LLM's auteurprofilering kunnen uitvoeren op basis van songteksten in een zero-shot setting, waarbij ze het geslacht en de etniciteit van zangers afleiden zonder taakspecifieke fine-tuning. Over verschillende open-source modellen geëvalueerd op meer dan 10.000 songteksten, constateren we dat LLM's een niet-triviale profileringprestatie leveren, maar systematische culturele afstemming vertonen: de meeste modellen neigen standaard naar Noord-Amerikaanse etniciteit, terwijl DeepSeek-1.5B sterker afgestemd is op Aziatische etniciteit. Deze bevinding komt naar voren uit zowel de voorspellingsdistributies van de modellen als een analyse van hun gegenereerde rechtvaardigingen. Om deze verschillen te kwantificeren, introduceren we twee billijkheidsmaten, Modality Accuracy Divergence (MAD) en Recall Divergence (RD), en tonen we aan dat Mistral-8B de sterkste etniciteitsbias vertoont onder de geëvalueerde modellen, terwijl Gemma-12B het meest gebalanceerde gedrag vertoont. Onze code is beschikbaar op GitHub (https://github.com/ValentinLafargue/CulturalProbingLLM).

English

Large language models (LLMs) are increasingly deployed in applications with societal impact, raising concerns about the cultural biases they encode. We probe these representations by evaluating whether LLMs can perform author profiling from song lyrics in a zero-shot setting, inferring singers' gender and ethnicity without task-specific fine-tuning. Across several open-source models evaluated on more than 10,000 lyrics, we find that LLMs achieve non-trivial profiling performance but demonstrate systematic cultural alignment: most models default toward North American ethnicity, while DeepSeek-1.5B aligns more strongly with Asian ethnicity. This finding emerges from both the models' prediction distributions and an analysis of their generated rationales. To quantify these disparities, we introduce two fairness metrics, Modality Accuracy Divergence (MAD) and Recall Divergence (RD), and show that Ministral-8B displays the strongest ethnicity bias among the evaluated models, whereas Gemma-12B shows the most balanced behavior. Our code is available on GitHub (https://github.com/ValentinLafargue/CulturalProbingLLM).

Het Onderzoeken van Culturele Signalen in Grote Taalmodellen via Auteurprofilering

Probing Cultural Signals in Large Language Models through Author Profiling

Samenvatting

Support