MiRAGeNews: 다중 모달 현실적 AI 생성 뉴스 탐지

초록

염증성이거나 오도된 "가짜" 뉴스 콘텐츠의 확산이 최근 몇 년간 점점 더 흔해졌습니다. 동시에 AI 도구를 사용하여 상상할 수 있는 모든 장면을 묘사하는 사실적인 이미지를 생성하는 것이 이전보다 쉬워졌습니다. 이 두 가지를 결합한 AI 생성 가짜 뉴스 콘텐츠는 특히 강력하고 위험합니다. AI 생성 가짜 뉴스의 확산을 막기 위해, 우리는 MiRAGeNews 데이터셋을 제안합니다. 이 데이터셋은 최첨단 생성기로부터 얻은 12,500개의 고품질 실제 및 AI 생성 이미지-캡션 쌍으로 구성되어 있습니다. 우리는 이 데이터셋이 사람들에게 상당한 어려움을 제공한다는 것을 발견했습니다 (60% F-1) 그리고 최첨단 다중 모달 LLMs (< 24% F-1). 우리는 이 데이터셋을 사용하여 도메인 밖의 이미지 생성기 및 뉴스 발행인으로부터 얻은 이미지-캡션 쌍에 대해 최첨단 베이스라인보다 +5.1% F-1을 개선하는 다중 모달 탐지기(MiRAGe)를 훈련시켰습니다. 우리는 AI 생성 콘텐츠 탐지에 대한 향후 연구를 돕기 위해 우리의 코드와 데이터를 공개합니다.

English

The proliferation of inflammatory or misleading "fake" news content has become increasingly common in recent years. Simultaneously, it has become easier than ever to use AI tools to generate photorealistic images depicting any scene imaginable. Combining these two -- AI-generated fake news content -- is particularly potent and dangerous. To combat the spread of AI-generated fake news, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality real and AI-generated image-caption pairs from state-of-the-art generators. We find that our dataset poses a significant challenge to humans (60% F-1) and state-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train a multi-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-art baselines on image-caption pairs from out-of-domain image generators and news publishers. We release our code and data to aid future work on detecting AI-generated content.

MiRAGeNews: 다중 모달 현실적 AI 생성 뉴스 탐지

MiRAGeNews: Multimodal Realistic AI-Generated News Detection

초록

Support