PictSure:预训练嵌入对上下文学习图像分类器至关重要
PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers
June 16, 2025
作者: Lukas Schiesser, Cornelius Wolff, Sophie Haas, Simon Pukrop
cs.AI
摘要
在数据稀缺的领域中,构建图像分类模型仍然是一项繁琐的任务,因为收集大量标注数据往往不切实际。上下文学习(ICL)作为一种新兴的范式,为少样本图像分类(FSIC)提供了新的可能性,使模型能够在无需基于梯度的适应情况下跨领域泛化。然而,先前的研究在很大程度上忽视了基于ICL的FSIC流程中的一个关键组成部分:图像嵌入的作用。在本研究中,我们提出了PictSure,一个将嵌入模型——其架构、预训练及训练动态——置于分析核心的ICL框架。我们系统地探讨了不同类型的视觉编码器、预训练目标及微调策略对下游FSIC性能的影响。实验结果表明,训练的成功与否以及跨域性能高度依赖于嵌入模型的预训练方式。因此,PictSure在显著不同于训练分布的跨域基准测试中超越了现有的基于ICL的FSIC模型,同时在域内任务上保持了可比较的结果。代码可在https://github.com/PictSure/pictsure-library 获取。
English
Building image classification models remains cumbersome in data-scarce
domains, where collecting large labeled datasets is impractical. In-context
learning (ICL) has emerged as a promising paradigm for few-shot image
classification (FSIC), enabling models to generalize across domains without
gradient-based adaptation. However, prior work has largely overlooked a
critical component of ICL-based FSIC pipelines: the role of image embeddings.
In this work, we present PictSure, an ICL framework that places the embedding
model -- its architecture, pretraining, and training dynamics -- at the center
of analysis. We systematically examine the effects of different visual encoder
types, pretraining objectives, and fine-tuning strategies on downstream FSIC
performance. Our experiments show that the training success and the
out-of-domain performance are highly dependent on how the embedding models are
pretrained. Consequently, PictSure manages to outperform existing ICL-based
FSIC models on out-of-domain benchmarks that differ significantly from the
training distribution, while maintaining comparable results on in-domain tasks.
Code can be found at https://github.com/PictSure/pictsure-library.