다중모드 LLM을 속이는 것이 얼마나 쉬운가? 기만적 프롬프트에 대한 실증적 분석

초록

멀티모달 대형 언어 모델(MLLMs)의 놀라운 발전에도 불구하고, 이러한 모델들은 여전히 특정한 도전에 직면해 있습니다. 특히, 프롬프트에 포함된 기만적인 정보를 처리할 때 환각적인 응답을 생성하는 문제가 두드러집니다. 이러한 취약성을 정량적으로 평가하기 위해, 우리는 MAD-Bench라는 신중하게 구성된 벤치마크를 제시합니다. 이 벤치마크는 존재하지 않는 객체, 객체의 수, 공간적 관계, 시각적 혼란 등 6가지 범주로 나뉜 850개의 테스트 샘플을 포함하고 있습니다. 우리는 GPT-4V, Gemini-Pro부터 LLaVA-1.5, CogVLM과 같은 오픈소스 모델에 이르기까지 인기 있는 MLLMs에 대한 포괄적인 분석을 제공합니다. 실험적으로, GPT-4V와 다른 모델들 사이에 상당한 성능 격차가 관찰되었으며, 이전의 강력한 지시 튜닝 모델들인 LRV-Instruction과 LLaVA-RLHF도 이 새로운 벤치마크에서는 효과적이지 못했습니다. GPT-4V는 MAD-Bench에서 75.02%의 정확도를 달성한 반면, 우리 실험에서 다른 모델들의 정확도는 5%에서 35% 사이에 머물렀습니다. 우리는 추가적으로, 모델이 질문에 답하기 전에 다시 한 번 생각하도록 유도하기 위해 기만적인 프롬프트에 추가 단락을 첨가하는 해결책을 제안합니다. 놀랍게도, 이 간단한 방법은 정확도를 두 배로 높일 수 있었지만, 절대적인 수치는 여전히 만족스럽지 못한 수준입니다. 우리는 MAD-Bench가 기만적인 프롬프트에 대한 모델의 회복력을 강화하기 위한 추가 연구를 촉진하는 가치 있는 벤치마크로 활용되기를 바랍니다.

English

The remarkable advancements in Multimodal Large Language Models (MLLMs) have not rendered them immune to challenges, particularly in the context of handling deceptive information in prompts, thus producing hallucinated responses under such conditions. To quantitatively assess this vulnerability, we present MAD-Bench, a carefully curated benchmark that contains 850 test samples divided into 6 categories, such as non-existent objects, count of objects, spatial relationship, and visual confusion. We provide a comprehensive analysis of popular MLLMs, ranging from GPT-4V, Gemini-Pro, to open-sourced models, such as LLaVA-1.5 and CogVLM. Empirically, we observe significant performance gaps between GPT-4V and other models; and previous robust instruction-tuned models, such as LRV-Instruction and LLaVA-RLHF, are not effective on this new benchmark. While GPT-4V achieves 75.02% accuracy on MAD-Bench, the accuracy of any other model in our experiments ranges from 5% to 35%. We further propose a remedy that adds an additional paragraph to the deceptive prompts to encourage models to think twice before answering the question. Surprisingly, this simple method can even double the accuracy; however, the absolute numbers are still too low to be satisfactory. We hope MAD-Bench can serve as a valuable benchmark to stimulate further research to enhance models' resilience against deceptive prompts.

다중모드 LLM을 속이는 것이 얼마나 쉬운가? 기만적 프롬프트에 대한 실증적 분석

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

초록

Support