ChatPaper.aiChatPaper

Hatevolution:静态基准测试未能揭示的真相

Hatevolution: What Static Benchmarks Don't Tell Us

June 13, 2025
作者: Chiara Di Bonaventura, Barbara McGillivray, Yulan He, Albert Meroño-Peñuela
cs.AI

摘要

语言随时间演变,包括仇恨言论领域,其变化速度紧跟社会动态与文化变迁。尽管自然语言处理(NLP)研究已探讨了语言演变对模型训练的影响,并提出了若干应对策略,但其对模型基准测试的影响仍待深入探究。然而,仇恨言论基准在确保模型安全性方面扮演着关键角色。本文通过两项关于仇恨言论演变的实验,实证评估了20种语言模型的鲁棒性,揭示了静态评估与时间敏感性评估之间的时序错位。我们的研究结果呼吁建立时间敏感的语言基准,以便在仇恨言论领域正确且可靠地评估语言模型。
English
Language changes over time, including in the hate speech domain, which evolves quickly following social dynamics and cultural shifts. While NLP research has investigated the impact of language evolution on model training and has proposed several solutions for it, its impact on model benchmarking remains under-explored. Yet, hate speech benchmarks play a crucial role to ensure model safety. In this paper, we empirically evaluate the robustness of 20 language models across two evolving hate speech experiments, and we show the temporal misalignment between static and time-sensitive evaluations. Our findings call for time-sensitive linguistic benchmarks in order to correctly and reliably evaluate language models in the hate speech domain.
PDF02June 17, 2025