你的多模式LLM容易被欺騙嗎？對欺騙性提示的實證分析

摘要

儘管多模式大型語言模型（MLLMs）取得了顯著進展，但仍面臨挑戰，特別是在應對提示中的虛假信息時，容易在這種情況下產生幻覺式回應。為了定量評估這種脆弱性，我們提出了MAD-Bench，這是一個精心策劃的基準測試，包含850個測試樣本，分為6個類別，如不存在的物體、物體數量、空間關係和視覺混淆等。我們對流行的MLLMs進行了全面分析，從GPT-4V、Gemini-Pro到開源模型，如LLaVA-1.5和CogVLM。從實證角度來看，我們觀察到GPT-4V與其他模型之間存在顯著的性能差距；而先前的強健指令調整模型，如LRV-Instruction和LLaVA-RLHF，在這個新基準測試上並不有效。雖然GPT-4V在MAD-Bench上實現了75.02％的準確率，但我們實驗中任何其他模型的準確率範圍從5％到35％不等。我們進一步提出了一種解決方案，即在欺騙性提示中添加一段額外的段落，以鼓勵模型在回答問題之前三思。令人驚訝的是，這種簡單的方法甚至可以將準確率提高一倍；然而，絕對數字仍然太低，無法令人滿意。我們希望MAD-Bench可以作為一個有價值的基準測試，激發進一步研究，以增強模型對抗欺騙性提示的韌性。

English

The remarkable advancements in Multimodal Large Language Models (MLLMs) have not rendered them immune to challenges, particularly in the context of handling deceptive information in prompts, thus producing hallucinated responses under such conditions. To quantitatively assess this vulnerability, we present MAD-Bench, a carefully curated benchmark that contains 850 test samples divided into 6 categories, such as non-existent objects, count of objects, spatial relationship, and visual confusion. We provide a comprehensive analysis of popular MLLMs, ranging from GPT-4V, Gemini-Pro, to open-sourced models, such as LLaVA-1.5 and CogVLM. Empirically, we observe significant performance gaps between GPT-4V and other models; and previous robust instruction-tuned models, such as LRV-Instruction and LLaVA-RLHF, are not effective on this new benchmark. While GPT-4V achieves 75.02% accuracy on MAD-Bench, the accuracy of any other model in our experiments ranges from 5% to 35%. We further propose a remedy that adds an additional paragraph to the deceptive prompts to encourage models to think twice before answering the question. Surprisingly, this simple method can even double the accuracy; however, the absolute numbers are still too low to be satisfactory. We hope MAD-Bench can serve as a valuable benchmark to stimulate further research to enhance models' resilience against deceptive prompts.

你的多模式LLM容易被欺騙嗎？對欺騙性提示的實證分析

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

摘要

Support