揭開職業偏見:利用美國勞動市場數據對LLM進行基於事實的去偏見化
Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data
August 20, 2024
作者: Atmika Gorti, Manas Gaur, Aman Chadha
cs.AI
摘要
大型語言模型(LLMs)容易繼承和放大潛藏在訓練數據中的社會偏見,可能會強化與性別、職業和其他敏感類別相關的有害刻板印象。這個問題尤其棘手,因為存在偏見的LLMs可能產生深遠影響,導致不公平實踐並加劇各個領域的社會不平等,如招聘、線上內容審核,甚至刑事司法系統。雖然先前的研究集中於使用專門設計來凸顯內在偏見的數據集來檢測LLMs中的偏見,但對這些發現與來自美國勞工統計局(NBLS)等權威數據集的相關性的調查明顯不足。為填補這一空白,我們進行實證研究,評估LLMs在“開箱即用偏見”環境中的表現,分析生成的輸出與NBLS數據中的分佈相比如何。此外,我們提出了一種簡單而有效的去偏見機制,直接將NBLS實例納入以減輕LLMs中的偏見。我們的研究涵蓋七種不同的LLMs,包括可指導的、基本的和專家混合模型,揭示了現有偏見檢測技術經常忽略的顯著偏見水平。重要的是,我們的去偏見方法不依賴外部數據集,顯示出偏見分數顯著降低,凸顯了我們方法在創建更公平、更可靠的LLMs方面的有效性。
English
Large Language Models (LLMs) are prone to inheriting and amplifying societal
biases embedded within their training data, potentially reinforcing harmful
stereotypes related to gender, occupation, and other sensitive categories. This
issue becomes particularly problematic as biased LLMs can have far-reaching
consequences, leading to unfair practices and exacerbating social inequalities
across various domains, such as recruitment, online content moderation, or even
the criminal justice system. Although prior research has focused on detecting
bias in LLMs using specialized datasets designed to highlight intrinsic biases,
there has been a notable lack of investigation into how these findings
correlate with authoritative datasets, such as those from the U.S. National
Bureau of Labor Statistics (NBLS). To address this gap, we conduct empirical
research that evaluates LLMs in a ``bias-out-of-the-box" setting, analyzing how
the generated outputs compare with the distributions found in NBLS data.
Furthermore, we propose a straightforward yet effective debiasing mechanism
that directly incorporates NBLS instances to mitigate bias within LLMs. Our
study spans seven different LLMs, including instructable, base, and
mixture-of-expert models, and reveals significant levels of bias that are often
overlooked by existing bias detection techniques. Importantly, our debiasing
method, which does not rely on external datasets, demonstrates a substantial
reduction in bias scores, highlighting the efficacy of our approach in creating
fairer and more reliable LLMs.Summary
AI-Generated Summary