不可解问题检测:评估视觉语言模型的可信度
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models
March 29, 2024
作者: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa
cs.AI
摘要
本文提出了一项针对视觉语言模型(VLM)的新颖且重要的挑战,称为不可解问题检测(UPD)。UPD考察了VLM在视觉问答(VQA)任务中面对不可解问题时能否拒绝回答的能力。UPD包含三种不同的场景:缺失答案检测(AAD)、不兼容答案集检测(IASD)和不兼容视觉问题检测(IVQD)。为了深入研究UPD问题,大量实验表明,包括GPT-4V和LLaVA-Next-34B在内的多数VLM在我们的基准测试中表现不佳,显示出显著的改进空间。为应对UPD,我们探讨了无需训练和基于训练的解决方案,提供了关于其有效性和局限性的新见解。我们希望这些见解,连同未来在UPD设定下的努力,将促进对更实用、更可靠VLM的广泛理解和开发。
English
This paper introduces a novel and significant challenge for Vision Language
Models (VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the
VLM's ability to withhold answers when faced with unsolvable problems in the
context of Visual Question Answering (VQA) tasks. UPD encompasses three
distinct settings: Absent Answer Detection (AAD), Incompatible Answer Set
Detection (IASD), and Incompatible Visual Question Detection (IVQD). To deeply
investigate the UPD problem, extensive experiments indicate that most VLMs,
including GPT-4V and LLaVA-Next-34B, struggle with our benchmarks to varying
extents, highlighting significant room for the improvements. To address UPD, we
explore both training-free and training-based solutions, offering new insights
into their effectiveness and limitations. We hope our insights, together with
future efforts within the proposed UPD settings, will enhance the broader
understanding and development of more practical and reliable VLMs.Summary
AI-Generated Summary