マルチモーダルLLMをだますのはどれほど簡単か？欺瞞的プロンプトに関する実証分析

要旨

マルチモーダル大規模言語モデル（MLLMs）の目覚ましい進展にもかかわらず、特にプロンプト内の欺瞞的な情報を扱う際に幻覚的な応答を生成するという課題から免れることはできていない。この脆弱性を定量的に評価するため、我々はMAD-Benchを提案する。これは、存在しない物体、物体の数、空間関係、視覚的混乱など6つのカテゴリに分けられた850のテストサンプルからなる注意深く選ばれたベンチマークである。我々は、GPT-4V、Gemini-Proから、LLaVA-1.5やCogVLMなどのオープンソースモデルまで、人気のあるMLLMsの包括的な分析を提供する。実験的には、GPT-4Vと他のモデルとの間に大きな性能差が観察され、LRV-InstructionやLLaVA-RLHFのような以前の堅牢な指示チューニングモデルも、この新しいベンチマークでは効果的ではないことがわかった。GPT-4VはMAD-Benchで75.02%の精度を達成する一方、我々の実験における他のモデルの精度は5%から35%の範囲であった。さらに、欺瞞的なプロンプトに追加の段落を加えて、モデルが質問に答える前に再考するよう促す解決策を提案する。驚くべきことに、この単純な方法で精度が倍増することもあるが、絶対的な数値はまだ満足のいくレベルには達していない。我々は、MAD-Benchが欺瞞的なプロンプトに対するモデルの耐性を高めるためのさらなる研究を刺激する貴重なベンチマークとなることを期待している。

English

The remarkable advancements in Multimodal Large Language Models (MLLMs) have not rendered them immune to challenges, particularly in the context of handling deceptive information in prompts, thus producing hallucinated responses under such conditions. To quantitatively assess this vulnerability, we present MAD-Bench, a carefully curated benchmark that contains 850 test samples divided into 6 categories, such as non-existent objects, count of objects, spatial relationship, and visual confusion. We provide a comprehensive analysis of popular MLLMs, ranging from GPT-4V, Gemini-Pro, to open-sourced models, such as LLaVA-1.5 and CogVLM. Empirically, we observe significant performance gaps between GPT-4V and other models; and previous robust instruction-tuned models, such as LRV-Instruction and LLaVA-RLHF, are not effective on this new benchmark. While GPT-4V achieves 75.02% accuracy on MAD-Bench, the accuracy of any other model in our experiments ranges from 5% to 35%. We further propose a remedy that adds an additional paragraph to the deceptive prompts to encourage models to think twice before answering the question. Surprisingly, this simple method can even double the accuracy; however, the absolute numbers are still too low to be satisfactory. We hope MAD-Bench can serve as a valuable benchmark to stimulate further research to enhance models' resilience against deceptive prompts.

マルチモーダルLLMをだますのはどれほど簡単か？欺瞞的プロンプトに関する実証分析

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

要旨

Support