偏見與否：使用偏見檢測器識別新聞中的偏見

摘要

媒體偏見檢測是確保信息傳播公平公正的關鍵任務，然而由於偏見的主觀性及高質量標註數據的稀缺，這項任務仍然充滿挑戰。在本研究中，我們通過在專家標註的BABE數據集上微調基於RoBERTa的模型，進行了句子級別的偏見分類。利用McNemar檢驗和5x2交叉驗證配對t檢驗，我們展示了與領域適應性預訓練的DA-RoBERTa基線模型相比，我們的模型在性能上取得了統計學意義上的顯著提升。此外，基於注意力機制的分析表明，我們的模型避免了對政治敏感詞彙過度敏感等常見問題，而是更加關注上下文相關的詞彙。為了全面審視媒體偏見，我們提出了一個將我們的模型與現有的偏見類型分類器相結合的流程。儘管受限於句子級別的分析和數據集規模（由於缺乏更大更先進的偏見語料庫），我們的方法展現了良好的泛化能力和可解釋性。我們探討了上下文感知建模、偏見中和以及高級偏見類型分類作為未來可能的研究方向。我們的研究成果有助於構建更為健壯、可解釋且社會責任感強的NLP系統，用於媒體偏見檢測。

English

Media bias detection is a critical task in ensuring fair and balanced information dissemination, yet it remains challenging due to the subjectivity of bias and the scarcity of high-quality annotated data. In this work, we perform sentence-level bias classification by fine-tuning a RoBERTa-based model on the expert-annotated BABE dataset. Using McNemar's test and the 5x2 cross-validation paired t-test, we show statistically significant improvements in performance when comparing our model to a domain-adaptively pre-trained DA-RoBERTa baseline. Furthermore, attention-based analysis shows that our model avoids common pitfalls like oversensitivity to politically charged terms and instead attends more meaningfully to contextually relevant tokens. For a comprehensive examination of media bias, we present a pipeline that combines our model with an already-existing bias-type classifier. Our method exhibits good generalization and interpretability, despite being constrained by sentence-level analysis and dataset size because of a lack of larger and more advanced bias corpora. We talk about context-aware modeling, bias neutralization, and advanced bias type classification as potential future directions. Our findings contribute to building more robust, explainable, and socially responsible NLP systems for media bias detection.

偏見與否：使用偏見檢測器識別新聞中的偏見

To Bias or Not to Bias: Detecting bias in News with bias-detector

摘要

Support