MiRAGeNews：多模態寫實 AI 生成新聞檢測

摘要

近年來，充斥著具有煽動性或誤導性的「假」新聞內容已變得越來越普遍。與此同時，利用人工智慧工具生成栩栩如生的圖像以描繪任何想像得到的場景也變得比以往更容易。結合這兩者──人工智慧生成的假新聞內容──尤其具有強大且危險的影響力。為了打擊人工智慧生成的假新聞的傳播，我們提出 MiRAGeNews 資料集，這是一個包含 12,500 對高質量真實和人工智慧生成圖像標題的資料集，來自最先進的生成器。我們發現我們的資料集對人類（60% F-1）和最先進的多模態LLMs（< 24% F-1）構成了重大挑戰。使用我們的資料集，我們訓練了一個多模態檢測器（MiRAGe），在來自跨領域圖像生成器和新聞發布者的圖像標題對上，相較於最先進的基準，提高了 +5.1% 的 F-1。我們釋出我們的程式碼和資料，以協助未來檢測人工智慧生成內容的相關工作。

English

The proliferation of inflammatory or misleading "fake" news content has become increasingly common in recent years. Simultaneously, it has become easier than ever to use AI tools to generate photorealistic images depicting any scene imaginable. Combining these two -- AI-generated fake news content -- is particularly potent and dangerous. To combat the spread of AI-generated fake news, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality real and AI-generated image-caption pairs from state-of-the-art generators. We find that our dataset poses a significant challenge to humans (60% F-1) and state-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train a multi-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-art baselines on image-caption pairs from out-of-domain image generators and news publishers. We release our code and data to aid future work on detecting AI-generated content.