신뢰할 것인가, 시각-언어 모델의 예측을 신뢰하지 않을 것인가

초록

비전-언어 모델(VLMs)은 시각적 및 텍스트적 양상을 정렬하는 데 강력한 능력을 보여주며, 다중 모달 이해 및 생성 분야에서 다양한 응용이 가능하게 합니다. 제로샷 및 전이 학습 시나리오에서 뛰어난 성능을 보이지만, VLMs은 여전히 오분류에 취약하며 종종 자신만만하지만 잘못된 예측을 내놓습니다. 이러한 한계는 안전이 중요한 영역에서 심각한 결과를 초래할 수 있는 오류 예측으로 인해 상당한 위험을 초래합니다. 본 연구에서는 VLMs의 예측을 신뢰할 수 있는 시점을 추정하는 중요한 과제를 해결하기 위해 훈련이 필요 없는 프레임워크인 TrustVLM을 소개합니다. VLMs에서 관찰된 모달리티 간 격차와 특정 개념이 이미지 임베딩 공간에서 더 명확하게 표현된다는 통찰을 바탕으로, 우리는 이 공간을 활용하여 오분류 탐지를 개선하는 새로운 신뢰도 점수 함수를 제안합니다. 우리는 17개의 다양한 데이터셋에서 4가지 아키텍처와 2개의 VLMs을 사용하여 접근 방식을 엄격히 평가하였으며, 기존 베이스라인 대비 AURC에서 최대 51.87%, AUROC에서 9.14%, FPR95에서 32.42%의 성능 향상을 보여주며 최첨단 성능을 입증했습니다. 재훈련 없이 모델의 신뢰성을 향상시킴으로써, TrustVLM은 실제 응용에서 VLMs의 더 안전한 배치를 위한 길을 열어줍니다. 코드는 https://github.com/EPFL-IMOS/TrustVLM에서 제공될 예정입니다.

English

Vision-Language Models (VLMs) have demonstrated strong capabilities in aligning visual and textual modalities, enabling a wide range of applications in multimodal understanding and generation. While they excel in zero-shot and transfer learning scenarios, VLMs remain susceptible to misclassification, often yielding confident yet incorrect predictions. This limitation poses a significant risk in safety-critical domains, where erroneous predictions can lead to severe consequences. In this work, we introduce TrustVLM, a training-free framework designed to address the critical challenge of estimating when VLM's predictions can be trusted. Motivated by the observed modality gap in VLMs and the insight that certain concepts are more distinctly represented in the image embedding space, we propose a novel confidence-scoring function that leverages this space to improve misclassification detection. We rigorously evaluate our approach across 17 diverse datasets, employing 4 architectures and 2 VLMs, and demonstrate state-of-the-art performance, with improvements of up to 51.87% in AURC, 9.14% in AUROC, and 32.42% in FPR95 compared to existing baselines. By improving the reliability of the model without requiring retraining, TrustVLM paves the way for safer deployment of VLMs in real-world applications. The code will be available at https://github.com/EPFL-IMOS/TrustVLM.

신뢰할 것인가, 시각-언어 모델의 예측을 신뢰하지 않을 것인가

To Trust Or Not To Trust Your Vision-Language Model's Prediction

초록

Support