無法解決問題的檢測:評估視覺語言模型的可信度
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models
March 29, 2024
作者: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa
cs.AI
摘要
本文介紹了一個對視覺語言模型(VLMs)提出的新穎且重要的挑戰,稱為無法解決問題檢測(UPD)。UPD檢驗了VLM在視覺問答(VQA)任務中面對無法解決問題時保留答案的能力。UPD包含三個不同的設置:缺失答案檢測(AAD)、不相容答案集檢測(IASD)和不相容視覺問題檢測(IVQD)。為了深入研究UPD問題,廣泛的實驗表明,包括GPT-4V和LLaVA-Next-34B在內的大多數VLMs在不同程度上都難以應對我們的基準,突顯了改進的重要空間。為了應對UPD,我們探索了無需訓練和基於訓練的解決方案,提供了對其有效性和局限性的新見解。我們希望我們的見解,以及在提出的UPD設置內的未來努力,將增進對更實用和可靠的VLMs的更廣泛理解和發展。
English
This paper introduces a novel and significant challenge for Vision Language
Models (VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the
VLM's ability to withhold answers when faced with unsolvable problems in the
context of Visual Question Answering (VQA) tasks. UPD encompasses three
distinct settings: Absent Answer Detection (AAD), Incompatible Answer Set
Detection (IASD), and Incompatible Visual Question Detection (IVQD). To deeply
investigate the UPD problem, extensive experiments indicate that most VLMs,
including GPT-4V and LLaVA-Next-34B, struggle with our benchmarks to varying
extents, highlighting significant room for the improvements. To address UPD, we
explore both training-free and training-based solutions, offering new insights
into their effectiveness and limitations. We hope our insights, together with
future efforts within the proposed UPD settings, will enhance the broader
understanding and development of more practical and reliable VLMs.Summary
AI-Generated Summary