偏見與否:使用偏見檢測器識別新聞中的偏見
To Bias or Not to Bias: Detecting bias in News with bias-detector
May 19, 2025
作者: Himel Ghosh, Ahmed Mosharafa, Georg Groh
cs.AI
摘要
媒體偏見檢測是確保信息傳播公平公正的關鍵任務,然而由於偏見的主觀性及高質量標註數據的稀缺,這項任務仍然充滿挑戰。在本研究中,我們通過在專家標註的BABE數據集上微調基於RoBERTa的模型,進行了句子級別的偏見分類。利用McNemar檢驗和5x2交叉驗證配對t檢驗,我們展示了與領域適應性預訓練的DA-RoBERTa基線模型相比,我們的模型在性能上取得了統計學意義上的顯著提升。此外,基於注意力機制的分析表明,我們的模型避免了對政治敏感詞彙過度敏感等常見問題,而是更加關注上下文相關的詞彙。為了全面審視媒體偏見,我們提出了一個將我們的模型與現有的偏見類型分類器相結合的流程。儘管受限於句子級別的分析和數據集規模(由於缺乏更大更先進的偏見語料庫),我們的方法展現了良好的泛化能力和可解釋性。我們探討了上下文感知建模、偏見中和以及高級偏見類型分類作為未來可能的研究方向。我們的研究成果有助於構建更為健壯、可解釋且社會責任感強的NLP系統,用於媒體偏見檢測。
English
Media bias detection is a critical task in ensuring fair and balanced
information dissemination, yet it remains challenging due to the subjectivity
of bias and the scarcity of high-quality annotated data. In this work, we
perform sentence-level bias classification by fine-tuning a RoBERTa-based model
on the expert-annotated BABE dataset. Using McNemar's test and the 5x2
cross-validation paired t-test, we show statistically significant improvements
in performance when comparing our model to a domain-adaptively pre-trained
DA-RoBERTa baseline. Furthermore, attention-based analysis shows that our model
avoids common pitfalls like oversensitivity to politically charged terms and
instead attends more meaningfully to contextually relevant tokens. For a
comprehensive examination of media bias, we present a pipeline that combines
our model with an already-existing bias-type classifier. Our method exhibits
good generalization and interpretability, despite being constrained by
sentence-level analysis and dataset size because of a lack of larger and more
advanced bias corpora. We talk about context-aware modeling, bias
neutralization, and advanced bias type classification as potential future
directions. Our findings contribute to building more robust, explainable, and
socially responsible NLP systems for media bias detection.