ReMMD: 面向多模态虚假信息检测的现实多语言多图像智能体验证

摘要

多模态虚假信息检测日益重要，因为如今病毒式传播的帖子通常包含长篇多语言叙述、多张图片、混合来源以及微妙的图文框架错误。现有的基准和方法与这一场景仍不匹配：它们通常局限于短文本、单张图片、二元标签或单一操控来源，而基于智能体的验证在现实证据搜索中成本高昂。为此，我们提出 ReMMD——一个面向多模态虚假信息检测的现实多语言多图片智能体验证框架。ReMMD 包含 ReMMDBench，一个包含 500 个样本、2,756 张图片、五种单语言设置、两种跨语言设置、三种文本长度层级、多图片帖子、五类真实性标签、八类失真标签、证据来源及推理依据的现实世界多模态虚假信息检测基准。此外，它还包含 ReMMD-Agent，一个具有持久记忆的验证器，能够将帖子分解为原子化观点、构建可复用证据集，并预测结构化的 L1/L2/L3 输出。在闭源系统、开源 LVLM、MMD-Agent 和 T2-Agent 的对比中，ReMMD-Agent 在五类真实性分类上取得了最佳性能，使用 GPT-5.2 时准确率达 41.80%，宏 F1 达 39.12%，同时相比 MMD-Agent 成本降低 17.5%，相比 T2-Agent 成本降低 79.9%。项目地址为 https://dang-ai.github.io/ReMMD。

English

Multimodal misinformation detection is increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Existing benchmarks and methods remain poorly matched to this setting: they usually isolate short captions, single images, binary labels, or one manipulation source, while agentic verification remains costly under realistic evidence search. We present ReMMD, a realistic multilingual multi-image agentic verification framework for multimodal misinformation detection. ReMMD includes ReMMDBench, a real-world multimodal misinformation detection benchmark with 500 samples, 2,756 images, five monolingual languages, two cross-lingual settings, three text-length tiers, multi-image posts, five-way veracity labels, eight distortion labels, evidence provenance, and rationales. It also includes ReMMD-Agent, a persistent-memory verifier that decomposes posts into atomic points, builds a reusable evidence set, and predicts structured L1/L2/L3 outputs. Across proprietary systems, open LVLMs, MMD-Agent, and T2-Agent, ReMMD-Agent obtains the best five-way veracity performance, with 41.80% accuracy and 39.12% macro-F1 using GPT-5.2, while reducing cost by 17.5% relative to MMD-Agent and 79.9% relative to T2-Agent. The project is available at https://dang-ai.github.io/ReMMD.