AdamMeme:自適應探測多模態大型語言模型在有害性上的推理能力
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
July 2, 2025
作者: Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Zhen Ye, Guang Chen, Zhiyong Huang, Jing Ma
cs.AI
摘要
在社交媒體時代,多模態迷因的廣泛傳播要求多模態大型語言模型(mLLMs)能夠有效理解迷因的潛在危害。現有的評估mLLMs在有害迷因理解能力上的基準,主要依賴於基於準確率的、模型無關的靜態數據集評估。這些基準在提供最新且全面評估方面存在局限,因為網絡迷因是動態演變的。為此,我們提出了AdamMeme,一個靈活的、基於代理的評估框架,它能夠自適應地探測mLLMs在解讀迷因危害性時的推理能力。通過多代理協作,AdamMeme通過迭代更新具有挑戰性的迷因樣本數據,提供全面的評估,從而揭示mLLMs在解釋危害性時的具體限制。大量實驗表明,我們的框架系統性地揭示了不同目標mLLMs的表現差異,提供了深入、細粒度的模型特定弱點分析。我們的代碼可在https://github.com/Lbotirx/AdamMeme獲取。
English
The proliferation of multimodal memes in the social media era demands that
multimodal Large Language Models (mLLMs) effectively understand meme
harmfulness. Existing benchmarks for assessing mLLMs on harmful meme
understanding rely on accuracy-based, model-agnostic evaluations using static
datasets. These benchmarks are limited in their ability to provide up-to-date
and thorough assessments, as online memes evolve dynamically. To address this,
we propose AdamMeme, a flexible, agent-based evaluation framework that
adaptively probes the reasoning capabilities of mLLMs in deciphering meme
harmfulness. Through multi-agent collaboration, AdamMeme provides comprehensive
evaluations by iteratively updating the meme data with challenging samples,
thereby exposing specific limitations in how mLLMs interpret harmfulness.
Extensive experiments show that our framework systematically reveals the
varying performance of different target mLLMs, offering in-depth, fine-grained
analyses of model-specific weaknesses. Our code is available at
https://github.com/Lbotirx/AdamMeme.