PictSure：预训练嵌入对上下文学习图像分类器至关重要

摘要

在数据稀缺的领域中，构建图像分类模型仍然是一项繁琐的任务，因为收集大量标注数据往往不切实际。上下文学习（ICL）作为一种新兴的范式，为少样本图像分类（FSIC）提供了新的可能性，使模型能够在无需基于梯度的适应情况下跨领域泛化。然而，先前的研究在很大程度上忽视了基于ICL的FSIC流程中的一个关键组成部分：图像嵌入的作用。在本研究中，我们提出了PictSure，一个将嵌入模型——其架构、预训练及训练动态——置于分析核心的ICL框架。我们系统地探讨了不同类型的视觉编码器、预训练目标及微调策略对下游FSIC性能的影响。实验结果表明，训练的成功与否以及跨域性能高度依赖于嵌入模型的预训练方式。因此，PictSure在显著不同于训练分布的跨域基准测试中超越了现有的基于ICL的FSIC模型，同时在域内任务上保持了可比较的结果。代码可在https://github.com/PictSure/pictsure-library 获取。

English

Building image classification models remains cumbersome in data-scarce domains, where collecting large labeled datasets is impractical. In-context learning (ICL) has emerged as a promising paradigm for few-shot image classification (FSIC), enabling models to generalize across domains without gradient-based adaptation. However, prior work has largely overlooked a critical component of ICL-based FSIC pipelines: the role of image embeddings. In this work, we present PictSure, an ICL framework that places the embedding model -- its architecture, pretraining, and training dynamics -- at the center of analysis. We systematically examine the effects of different visual encoder types, pretraining objectives, and fine-tuning strategies on downstream FSIC performance. Our experiments show that the training success and the out-of-domain performance are highly dependent on how the embedding models are pretrained. Consequently, PictSure manages to outperform existing ICL-based FSIC models on out-of-domain benchmarks that differ significantly from the training distribution, while maintaining comparable results on in-domain tasks. Code can be found at https://github.com/PictSure/pictsure-library.

PictSure：预训练嵌入对上下文学习图像分类器至关重要

PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers

摘要

Support