PictSure：預訓練嵌入對上下文學習圖像分類器至關重要

摘要

在数据稀缺的领域中，构建图像分类模型仍然是一项繁琐的任务，因为收集大量标注数据集是不切实际的。上下文学习（ICL）作为一种有前景的范式，已在少样本图像分类（FSIC）中崭露头角，使模型能够在无需基于梯度的适应的情况下跨领域泛化。然而，先前的研究在很大程度上忽视了基于ICL的FSIC流程中的一个关键组成部分：图像嵌入的作用。在本研究中，我们提出了PictSure，一个将嵌入模型——其架构、预训练和训练动态——置于分析中心的ICL框架。我们系统地考察了不同类型的视觉编码器、预训练目标以及微调策略对下游FSIC性能的影响。我们的实验表明，训练的成功与否以及跨域性能高度依赖于嵌入模型的预训练方式。因此，PictSure在显著不同于训练分布的跨域基准测试中表现优异，超越了现有的基于ICL的FSIC模型，同时在域内任务上保持了可比的结果。代码可在https://github.com/PictSure/pictsure-library 找到。

English

Building image classification models remains cumbersome in data-scarce domains, where collecting large labeled datasets is impractical. In-context learning (ICL) has emerged as a promising paradigm for few-shot image classification (FSIC), enabling models to generalize across domains without gradient-based adaptation. However, prior work has largely overlooked a critical component of ICL-based FSIC pipelines: the role of image embeddings. In this work, we present PictSure, an ICL framework that places the embedding model -- its architecture, pretraining, and training dynamics -- at the center of analysis. We systematically examine the effects of different visual encoder types, pretraining objectives, and fine-tuning strategies on downstream FSIC performance. Our experiments show that the training success and the out-of-domain performance are highly dependent on how the embedding models are pretrained. Consequently, PictSure manages to outperform existing ICL-based FSIC models on out-of-domain benchmarks that differ significantly from the training distribution, while maintaining comparable results on in-domain tasks. Code can be found at https://github.com/PictSure/pictsure-library.

PictSure：預訓練嵌入對上下文學習圖像分類器至關重要

PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers

摘要

Support