MiRAGeNews：多模态逼真人工智能生成新闻检测

摘要

近年来，充斥着具有煽动性或误导性的“假”新闻内容变得越来越普遍。与此同时，利用人工智能工具生成栩栩如生的图像描绘任何想象得到的场景变得比以往任何时候都更容易。将这两者结合起来——即人工智能生成的假新闻内容——尤为强大且危险。为了打击人工智能生成的假新闻的传播，我们提出了MiRAGeNews数据集，这是一个包含来自最先进生成器的12,500对高质量真实和人工智能生成图像标题配对的数据集。我们发现，我们的数据集对人类（60% F-1）和最先进的多模态LLMs（< 24% F-1）构成了重大挑战。利用我们的数据集，我们训练了一个多模态检测器（MiRAGe），在来自域外图像生成器和新闻发布商的图像标题配对上，其F-1值比最先进基线提高了+5.1%。我们发布我们的代码和数据，以帮助未来检测人工智能生成内容的工作。

English

The proliferation of inflammatory or misleading "fake" news content has become increasingly common in recent years. Simultaneously, it has become easier than ever to use AI tools to generate photorealistic images depicting any scene imaginable. Combining these two -- AI-generated fake news content -- is particularly potent and dangerous. To combat the spread of AI-generated fake news, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality real and AI-generated image-caption pairs from state-of-the-art generators. We find that our dataset poses a significant challenge to humans (60% F-1) and state-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train a multi-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-art baselines on image-caption pairs from out-of-domain image generators and news publishers. We release our code and data to aid future work on detecting AI-generated content.

MiRAGeNews：多模态逼真人工智能生成新闻检测

MiRAGeNews: Multimodal Realistic AI-Generated News Detection

摘要

Support