你的多模态LLM有多容易被欺骗？对欺骗性提示的实证分析

摘要

在多模态大型语言模型（MLLMs）方面取得的显著进展并未使其免受挑战，特别是在处理提示中的虚假信息时，可能会产生幻觉式回应。为了定量评估这种脆弱性，我们提出了MAD-Bench，这是一个精心策划的基准测试，包含850个测试样本，分为6个类别，如不存在的对象、对象数量、空间关系和视觉混淆等。我们对流行的MLLMs进行了全面分析，涵盖了从GPT-4V、Gemini-Pro到开源模型，如LLaVA-1.5和CogVLM。从经验上看，我们观察到GPT-4V与其他模型之间存在显著的性能差距；而先前的鲁棒指令调整模型，如LRV-Instruction和LLaVA-RLHF，在这个新基准测试中并不有效。在MAD-Bench上，尽管GPT-4V实现了75.02%的准确率，但我们实验中任何其他模型的准确率范围在5%到35%之间。我们进一步提出了一种解决方案，即在虚假提示中添加额外段落，以鼓励模型在回答问题之前三思。令人惊讶的是，这种简单方法甚至可以将准确率翻倍；然而，绝对数字仍然太低，无法令人满意。我们希望MAD-Bench可以作为一个有价值的基准测试，激励进一步研究，增强模型对虚假提示的抵抗力。

English

The remarkable advancements in Multimodal Large Language Models (MLLMs) have not rendered them immune to challenges, particularly in the context of handling deceptive information in prompts, thus producing hallucinated responses under such conditions. To quantitatively assess this vulnerability, we present MAD-Bench, a carefully curated benchmark that contains 850 test samples divided into 6 categories, such as non-existent objects, count of objects, spatial relationship, and visual confusion. We provide a comprehensive analysis of popular MLLMs, ranging from GPT-4V, Gemini-Pro, to open-sourced models, such as LLaVA-1.5 and CogVLM. Empirically, we observe significant performance gaps between GPT-4V and other models; and previous robust instruction-tuned models, such as LRV-Instruction and LLaVA-RLHF, are not effective on this new benchmark. While GPT-4V achieves 75.02% accuracy on MAD-Bench, the accuracy of any other model in our experiments ranges from 5% to 35%. We further propose a remedy that adds an additional paragraph to the deceptive prompts to encourage models to think twice before answering the question. Surprisingly, this simple method can even double the accuracy; however, the absolute numbers are still too low to be satisfactory. We hope MAD-Bench can serve as a valuable benchmark to stimulate further research to enhance models' resilience against deceptive prompts.

你的多模态LLM有多容易被欺骗？对欺骗性提示的实证分析

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

摘要

Support