AdamMeme：自适应探测多模态大语言模型在有害性上的推理能力

摘要

在社交媒体时代，多模态表情包的广泛传播要求多模态大语言模型（mLLMs）能够有效理解表情包的有害性。现有的评估mLLMs对有害表情包理解能力的基准，主要依赖于基于准确率的、模型无关的静态数据集评估。这些基准在提供最新且全面评估方面存在局限，因为网络表情包是动态演变的。为解决这一问题，我们提出了AdamMeme，一个灵活的、基于代理的评估框架，它能够自适应地探测mLLMs在解析表情包有害性时的推理能力。通过多代理协作，AdamMeme通过迭代更新包含挑战性样本的表情包数据，提供全面的评估，从而揭示mLLMs在解读有害性方面的具体局限。大量实验表明，我们的框架系统地揭示了不同目标mLLMs的多样化表现，提供了针对模型特定弱点的深入、细粒度分析。我们的代码可在https://github.com/Lbotirx/AdamMeme获取。

English

The proliferation of multimodal memes in the social media era demands that multimodal Large Language Models (mLLMs) effectively understand meme harmfulness. Existing benchmarks for assessing mLLMs on harmful meme understanding rely on accuracy-based, model-agnostic evaluations using static datasets. These benchmarks are limited in their ability to provide up-to-date and thorough assessments, as online memes evolve dynamically. To address this, we propose AdamMeme, a flexible, agent-based evaluation framework that adaptively probes the reasoning capabilities of mLLMs in deciphering meme harmfulness. Through multi-agent collaboration, AdamMeme provides comprehensive evaluations by iteratively updating the meme data with challenging samples, thereby exposing specific limitations in how mLLMs interpret harmfulness. Extensive experiments show that our framework systematically reveals the varying performance of different target mLLMs, offering in-depth, fine-grained analyses of model-specific weaknesses. Our code is available at https://github.com/Lbotirx/AdamMeme.

AdamMeme：自适应探测多模态大语言模型在有害性上的推理能力

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

摘要

Support