ChatPaper.aiChatPaper

PictSure:預訓練嵌入對上下文學習圖像分類器至關重要

PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers

June 16, 2025
作者: Lukas Schiesser, Cornelius Wolff, Sophie Haas, Simon Pukrop
cs.AI

摘要

在数据稀缺的领域中,构建图像分类模型仍然是一项繁琐的任务,因为收集大量标注数据集是不切实际的。上下文学习(ICL)作为一种有前景的范式,已在少样本图像分类(FSIC)中崭露头角,使模型能够在无需基于梯度的适应的情况下跨领域泛化。然而,先前的研究在很大程度上忽视了基于ICL的FSIC流程中的一个关键组成部分:图像嵌入的作用。在本研究中,我们提出了PictSure,一个将嵌入模型——其架构、预训练和训练动态——置于分析中心的ICL框架。我们系统地考察了不同类型的视觉编码器、预训练目标以及微调策略对下游FSIC性能的影响。我们的实验表明,训练的成功与否以及跨域性能高度依赖于嵌入模型的预训练方式。因此,PictSure在显著不同于训练分布的跨域基准测试中表现优异,超越了现有的基于ICL的FSIC模型,同时在域内任务上保持了可比的结果。代码可在https://github.com/PictSure/pictsure-library 找到。
English
Building image classification models remains cumbersome in data-scarce domains, where collecting large labeled datasets is impractical. In-context learning (ICL) has emerged as a promising paradigm for few-shot image classification (FSIC), enabling models to generalize across domains without gradient-based adaptation. However, prior work has largely overlooked a critical component of ICL-based FSIC pipelines: the role of image embeddings. In this work, we present PictSure, an ICL framework that places the embedding model -- its architecture, pretraining, and training dynamics -- at the center of analysis. We systematically examine the effects of different visual encoder types, pretraining objectives, and fine-tuning strategies on downstream FSIC performance. Our experiments show that the training success and the out-of-domain performance are highly dependent on how the embedding models are pretrained. Consequently, PictSure manages to outperform existing ICL-based FSIC models on out-of-domain benchmarks that differ significantly from the training distribution, while maintaining comparable results on in-domain tasks. Code can be found at https://github.com/PictSure/pictsure-library.
PDF72June 19, 2025