ChatPaper.aiChatPaper

RaVL:在微調視覺語言模型中發現並減輕虛假相關性

RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models

November 6, 2024
作者: Maya Varma, Jean-Benoit Delbrouck, Zhihong Chen, Akshay Chaudhari, Curtis Langlotz
cs.AI

摘要

精煉的視覺語言模型(VLMs)常常捕捉到影像特徵與文字屬性之間的虛假相關性,導致在測試時性能下降。現有方法針對虛假相關性的解決方案(i)主要在全局影像層面操作,而非直接介入精細的影像特徵,以及(ii)主要設計用於單模態設置。在本研究中,我們提出 RaVL,通過發現並減輕虛假相關性,以局部影像特徵而非在全局影像層面操作,從而對 VLM 的穩健性提出了精細的觀點。給定一個精煉的 VLM,RaVL 首先通過利用區域級別的聚類方法來識別導致零樣本分類錯誤的精確影像特徵,發現虛假相關性。然後,RaVL 通過一個新穎的區域感知損失函數來減輕已識別的虛假相關性,使 VLM 在精煉過程中專注於相關區域並忽略虛假關係。我們在 654 個 VLM 上評估了 RaVL,這些 VLM 具有各種模型架構、數據領域和學習的虛假相關性。我們的結果顯示,RaVL 能夠準確發現(比最接近的基線改進了 191%)和減輕(最差組圖像分類準確性改進了 8.2%)虛假相關性。對於一般領域和醫學領域的 VLMs 進行的定性評估驗證了我們的發現。
English
Fine-tuned vision-language models (VLMs) often capture spurious correlations between image features and textual attributes, resulting in degraded zero-shot performance at test time. Existing approaches for addressing spurious correlations (i) primarily operate at the global image-level rather than intervening directly on fine-grained image features and (ii) are predominantly designed for unimodal settings. In this work, we present RaVL, which takes a fine-grained perspective on VLM robustness by discovering and mitigating spurious correlations using local image features rather than operating at the global image level. Given a fine-tuned VLM, RaVL first discovers spurious correlations by leveraging a region-level clustering approach to identify precise image features contributing to zero-shot classification errors. Then, RaVL mitigates the identified spurious correlation with a novel region-aware loss function that enables the VLM to focus on relevant regions and ignore spurious relationships during fine-tuning. We evaluate RaVL on 654 VLMs with various model architectures, data domains, and learned spurious correlations. Our results show that RaVL accurately discovers (191% improvement over the closest baseline) and mitigates (8.2% improvement on worst-group image classification accuracy) spurious correlations. Qualitative evaluations on general-domain and medical-domain VLMs confirm our findings.

Summary

AI-Generated Summary

PDF52November 14, 2024