你的多模态LLM有多容易被欺骗?对欺骗性提示的实证分析
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts
February 20, 2024
作者: Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan
cs.AI
摘要
在多模态大型语言模型(MLLMs)方面取得的显著进展并未使其免受挑战,特别是在处理提示中的虚假信息时,可能会产生幻觉式回应。为了定量评估这种脆弱性,我们提出了MAD-Bench,这是一个精心策划的基准测试,包含850个测试样本,分为6个类别,如不存在的对象、对象数量、空间关系和视觉混淆等。我们对流行的MLLMs进行了全面分析,涵盖了从GPT-4V、Gemini-Pro到开源模型,如LLaVA-1.5和CogVLM。从经验上看,我们观察到GPT-4V与其他模型之间存在显著的性能差距;而先前的鲁棒指令调整模型,如LRV-Instruction和LLaVA-RLHF,在这个新基准测试中并不有效。在MAD-Bench上,尽管GPT-4V实现了75.02%的准确率,但我们实验中任何其他模型的准确率范围在5%到35%之间。我们进一步提出了一种解决方案,即在虚假提示中添加额外段落,以鼓励模型在回答问题之前三思。令人惊讶的是,这种简单方法甚至可以将准确率翻倍;然而,绝对数字仍然太低,无法令人满意。我们希望MAD-Bench可以作为一个有价值的基准测试,激励进一步研究,增强模型对虚假提示的抵抗力。
English
The remarkable advancements in Multimodal Large Language Models (MLLMs) have
not rendered them immune to challenges, particularly in the context of handling
deceptive information in prompts, thus producing hallucinated responses under
such conditions. To quantitatively assess this vulnerability, we present
MAD-Bench, a carefully curated benchmark that contains 850 test samples divided
into 6 categories, such as non-existent objects, count of objects, spatial
relationship, and visual confusion. We provide a comprehensive analysis of
popular MLLMs, ranging from GPT-4V, Gemini-Pro, to open-sourced models, such as
LLaVA-1.5 and CogVLM. Empirically, we observe significant performance gaps
between GPT-4V and other models; and previous robust instruction-tuned models,
such as LRV-Instruction and LLaVA-RLHF, are not effective on this new
benchmark. While GPT-4V achieves 75.02% accuracy on MAD-Bench, the accuracy of
any other model in our experiments ranges from 5% to 35%. We further propose a
remedy that adds an additional paragraph to the deceptive prompts to encourage
models to think twice before answering the question. Surprisingly, this simple
method can even double the accuracy; however, the absolute numbers are still
too low to be satisfactory. We hope MAD-Bench can serve as a valuable benchmark
to stimulate further research to enhance models' resilience against deceptive
prompts.Summary
AI-Generated Summary