ChatPaper.aiChatPaper

GPT-4V(ision)對分布變化的適應程度如何?初步探究

How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation

December 12, 2023
作者: Zhongyi Han, Guanglin Zhou, Rundong He, Jindong Wang, Xing Xie, Tailin Wu, Yilong Yin, Salman Khan, Lina Yao, Tongliang Liu, Kun Zhang
cs.AI

摘要

在機器學習中,對抗分布轉移的泛化能力──即部署條件與訓練情境不同──對於氣候建模、生物醫學和自動駕駛等領域至關重要。基礎模型的出現,以其廣泛的預訓練和任務多功能性而著稱,引發了對其適應分布轉移能力的增加興趣。GPT-4V(ision)作為最先進的可公開訪問的多模態基礎模型,在各個領域廣泛應用,包括異常檢測、視頻理解、圖像生成和醫學診斷。然而,其對抗數據分布的穩健性仍然鮮为人知。為填補這一空白,本研究嚴格評估了GPT-4V在動態環境中的適應性和泛化能力,並與CLIP和LLaVA等知名模型進行了對比。我們深入探討了GPT-4V在自然、醫學和分子領域涵蓋的13個不同數據集上的零樣本泛化。我們進一步研究了其對受控數據干擾的適應性,並檢驗了上下文學習作為增強其適應性的工具的有效性。我們的研究結果勾勒了GPT-4V在分布轉移中的能力邊界,闡明了其在各種情境下的優勢和局限性。重要的是,這項研究有助於我們了解AI基礎模型如何對抗分布轉移,提供了對其適應性和穩健性的重要見解。代碼可在https://github.com/jameszhou-gl/gpt-4v-distribution-shift 公開獲取。
English
In machine learning, generalization against distribution shifts -- where deployment conditions diverge from the training scenarios -- is crucial, particularly in fields like climate modeling, biomedicine, and autonomous driving. The emergence of foundation models, distinguished by their extensive pretraining and task versatility, has led to an increased interest in their adaptability to distribution shifts. GPT-4V(ision) acts as the most advanced publicly accessible multimodal foundation model, with extensive applications across various domains, including anomaly detection, video understanding, image generation, and medical diagnosis. However, its robustness against data distributions remains largely underexplored. Addressing this gap, this study rigorously evaluates GPT-4V's adaptability and generalization capabilities in dynamic environments, benchmarking against prominent models like CLIP and LLaVA. We delve into GPT-4V's zero-shot generalization across 13 diverse datasets spanning natural, medical, and molecular domains. We further investigate its adaptability to controlled data perturbations and examine the efficacy of in-context learning as a tool to enhance its adaptation. Our findings delineate GPT-4V's capability boundaries in distribution shifts, shedding light on its strengths and limitations across various scenarios. Importantly, this investigation contributes to our understanding of how AI foundation models generalize to distribution shifts, offering pivotal insights into their adaptability and robustness. Code is publicly available at https://github.com/jameszhou-gl/gpt-4v-distribution-shift.
PDF110December 15, 2024