GPT-4V(ision)对分布转移适应得如何?初步调查
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation
December 12, 2023
作者: Zhongyi Han, Guanglin Zhou, Rundong He, Jindong Wang, Xing Xie, Tailin Wu, Yilong Yin, Salman Khan, Lina Yao, Tongliang Liu, Kun Zhang
cs.AI
摘要
在机器学习中,针对分布转移的泛化能力至关重要,即部署条件与训练场景不同,特别是在气候建模、生物医学和自动驾驶等领域。基于其广泛的预训练和任务多样性而备受瞩目的基础模型的出现,引发了人们对其适应分布转移能力的增加兴趣。GPT-4V(ision)作为最先进的公开获取的多模态基础模型,在异常检测、视频理解、图像生成和医学诊断等各个领域都有广泛应用。然而,它对数据分布的稳健性仍然鲜为人知。为填补这一空白,本研究对GPT-4V在动态环境中的适应性和泛化能力进行了严格评估,并与CLIP和LLaVA等知名模型进行了基准比较。我们深入探讨了GPT-4V在自然、医学和分子领域跨越13个不同数据集的零样本泛化能力。我们进一步研究了其对受控数据扰动的适应性,并检验了上下文学习作为增强其适应性的工具的有效性。我们的研究结果勾勒出了GPT-4V在分布转移中的能力边界,阐明了其在各种场景下的优势和局限性。重要的是,这项研究有助于我们了解AI基础模型如何对抗分布转移,为我们提供了关于它们适应性和稳健性的重要见解。代码公开获取链接为https://github.com/jameszhou-gl/gpt-4v-distribution-shift。
English
In machine learning, generalization against distribution shifts -- where
deployment conditions diverge from the training scenarios -- is crucial,
particularly in fields like climate modeling, biomedicine, and autonomous
driving. The emergence of foundation models, distinguished by their extensive
pretraining and task versatility, has led to an increased interest in their
adaptability to distribution shifts. GPT-4V(ision) acts as the most advanced
publicly accessible multimodal foundation model, with extensive applications
across various domains, including anomaly detection, video understanding, image
generation, and medical diagnosis. However, its robustness against data
distributions remains largely underexplored. Addressing this gap, this study
rigorously evaluates GPT-4V's adaptability and generalization capabilities in
dynamic environments, benchmarking against prominent models like CLIP and
LLaVA. We delve into GPT-4V's zero-shot generalization across 13 diverse
datasets spanning natural, medical, and molecular domains. We further
investigate its adaptability to controlled data perturbations and examine the
efficacy of in-context learning as a tool to enhance its adaptation. Our
findings delineate GPT-4V's capability boundaries in distribution shifts,
shedding light on its strengths and limitations across various scenarios.
Importantly, this investigation contributes to our understanding of how AI
foundation models generalize to distribution shifts, offering pivotal insights
into their adaptability and robustness. Code is publicly available at
https://github.com/jameszhou-gl/gpt-4v-distribution-shift.