ChatPaper.aiChatPaper

無法解決問題的檢測:評估視覺語言模型的可信度

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

March 29, 2024
作者: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa
cs.AI

摘要

本文介紹了一個對視覺語言模型(VLMs)提出的新穎且重要的挑戰,稱為無法解決問題檢測(UPD)。UPD檢驗了VLM在視覺問答(VQA)任務中面對無法解決問題時保留答案的能力。UPD包含三個不同的設置:缺失答案檢測(AAD)、不相容答案集檢測(IASD)和不相容視覺問題檢測(IVQD)。為了深入研究UPD問題,廣泛的實驗表明,包括GPT-4V和LLaVA-Next-34B在內的大多數VLMs在不同程度上都難以應對我們的基準,突顯了改進的重要空間。為了應對UPD,我們探索了無需訓練和基於訓練的解決方案,提供了對其有效性和局限性的新見解。我們希望我們的見解,以及在提出的UPD設置內的未來努力,將增進對更實用和可靠的VLMs的更廣泛理解和發展。
English
This paper introduces a novel and significant challenge for Vision Language Models (VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the VLM's ability to withhold answers when faced with unsolvable problems in the context of Visual Question Answering (VQA) tasks. UPD encompasses three distinct settings: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Incompatible Visual Question Detection (IVQD). To deeply investigate the UPD problem, extensive experiments indicate that most VLMs, including GPT-4V and LLaVA-Next-34B, struggle with our benchmarks to varying extents, highlighting significant room for the improvements. To address UPD, we explore both training-free and training-based solutions, offering new insights into their effectiveness and limitations. We hope our insights, together with future efforts within the proposed UPD settings, will enhance the broader understanding and development of more practical and reliable VLMs.

Summary

AI-Generated Summary

PDF162November 26, 2024