偏见与否：用偏见检测器识别新闻中的偏见

摘要

媒体偏见检测是确保信息公平、平衡传播的关键任务，但由于偏见的主观性及高质量标注数据的稀缺，这一任务仍具挑战性。在本研究中，我们通过在专家标注的BABE数据集上微调基于RoBERTa的模型，实现了句子级别的偏见分类。通过McNemar检验和5x2交叉验证配对t检验，我们展示了与领域自适应预训练的DA-RoBERTa基线模型相比，我们的模型在性能上取得了统计学上的显著提升。此外，基于注意力的分析表明，我们的模型避免了诸如对政治敏感词汇过度敏感等常见问题，而是更加关注上下文相关的词汇。为了全面审视媒体偏见，我们提出了一种将我们的模型与现有的偏见类型分类器相结合的流程。尽管受限于句子级分析和数据集规模（因缺乏更大、更先进的偏见语料库），我们的方法仍展现出良好的泛化能力和可解释性。我们探讨了上下文感知建模、偏见中和以及高级偏见类型分类作为未来可能的研究方向。我们的研究成果为构建更健壮、可解释且社会责任感更强的自然语言处理系统，用于媒体偏见检测，做出了贡献。

English

Media bias detection is a critical task in ensuring fair and balanced information dissemination, yet it remains challenging due to the subjectivity of bias and the scarcity of high-quality annotated data. In this work, we perform sentence-level bias classification by fine-tuning a RoBERTa-based model on the expert-annotated BABE dataset. Using McNemar's test and the 5x2 cross-validation paired t-test, we show statistically significant improvements in performance when comparing our model to a domain-adaptively pre-trained DA-RoBERTa baseline. Furthermore, attention-based analysis shows that our model avoids common pitfalls like oversensitivity to politically charged terms and instead attends more meaningfully to contextually relevant tokens. For a comprehensive examination of media bias, we present a pipeline that combines our model with an already-existing bias-type classifier. Our method exhibits good generalization and interpretability, despite being constrained by sentence-level analysis and dataset size because of a lack of larger and more advanced bias corpora. We talk about context-aware modeling, bias neutralization, and advanced bias type classification as potential future directions. Our findings contribute to building more robust, explainable, and socially responsible NLP systems for media bias detection.

偏见与否：用偏见检测器识别新闻中的偏见

To Bias or Not to Bias: Detecting bias in News with bias-detector

摘要

Support