信任与否：如何评估视觉语言模型的预测可靠性

摘要

视觉语言模型（VLMs）在视觉与文本模态的对齐方面展现了强大的能力，推动了多模态理解与生成的广泛应用。尽管在零样本学习和迁移学习场景中表现出色，VLMs仍易受误分类影响，常做出自信但错误的预测。这一局限在安全关键领域构成了重大风险，错误预测可能导致严重后果。本研究中，我们提出了TrustVLM，一个无需重新训练即可解决VLM预测可信度评估这一关键挑战的框架。受VLMs中模态差距的观察及某些概念在图像嵌入空间中表现更为显著的启发，我们设计了一种新颖的置信度评分函数，利用该空间提升误分类检测能力。我们在17个多样化数据集上，采用4种架构和2种VLMs进行了严格评估，结果显示TrustVLM在AURC上提升高达51.87%，AUROC提升9.14%，FPR95降低32.42%，均超越现有基线，达到最先进水平。通过无需重新训练即可提升模型可靠性，TrustVLM为VLMs在现实世界应用中的安全部署铺平了道路。代码将在https://github.com/EPFL-IMOS/TrustVLM 公开。

English

Vision-Language Models (VLMs) have demonstrated strong capabilities in aligning visual and textual modalities, enabling a wide range of applications in multimodal understanding and generation. While they excel in zero-shot and transfer learning scenarios, VLMs remain susceptible to misclassification, often yielding confident yet incorrect predictions. This limitation poses a significant risk in safety-critical domains, where erroneous predictions can lead to severe consequences. In this work, we introduce TrustVLM, a training-free framework designed to address the critical challenge of estimating when VLM's predictions can be trusted. Motivated by the observed modality gap in VLMs and the insight that certain concepts are more distinctly represented in the image embedding space, we propose a novel confidence-scoring function that leverages this space to improve misclassification detection. We rigorously evaluate our approach across 17 diverse datasets, employing 4 architectures and 2 VLMs, and demonstrate state-of-the-art performance, with improvements of up to 51.87% in AURC, 9.14% in AUROC, and 32.42% in FPR95 compared to existing baselines. By improving the reliability of the model without requiring retraining, TrustVLM paves the way for safer deployment of VLMs in real-world applications. The code will be available at https://github.com/EPFL-IMOS/TrustVLM.

信任与否：如何评估视觉语言模型的预测可靠性

To Trust Or Not To Trust Your Vision-Language Model's Prediction

摘要

Support