PictSure: インコンテキスト学習画像分類器における事前学習埋め込みの重要性

要旨

データが不足している領域では、大規模なラベル付きデータセットを収集することが現実的でないため、画像分類モデルの構築は依然として煩雑な作業です。インコンテキスト学習（ICL）は、few-shot画像分類（FSIC）の有望なパラダイムとして登場し、勾配ベースの適応なしにモデルがドメイン間で汎化することを可能にしました。しかし、これまでの研究では、ICLベースのFSICパイプラインの重要な構成要素である画像埋め込みの役割をほとんど考慮してきませんでした。本研究では、埋め込みモデル（そのアーキテクチャ、事前学習、および学習ダイナミクス）を分析の中心に据えたICLフレームワークであるPictSureを提案します。我々は、異なる視覚エンコーダのタイプ、事前学習の目的、および微調整戦略が下流のFSIC性能に及ぼす影響を体系的に検証します。実験結果から、学習の成功とドメイン外の性能は、埋め込みモデルがどのように事前学習されたかに大きく依存することが明らかになりました。その結果、PictSureは、学習分布と大きく異なるドメイン外のベンチマークにおいて、既存のICLベースのFSICモデルを上回る性能を発揮しつつ、ドメイン内タスクでも同等の結果を維持することに成功しました。コードはhttps://github.com/PictSure/pictsure-libraryで公開されています。

English

Building image classification models remains cumbersome in data-scarce domains, where collecting large labeled datasets is impractical. In-context learning (ICL) has emerged as a promising paradigm for few-shot image classification (FSIC), enabling models to generalize across domains without gradient-based adaptation. However, prior work has largely overlooked a critical component of ICL-based FSIC pipelines: the role of image embeddings. In this work, we present PictSure, an ICL framework that places the embedding model -- its architecture, pretraining, and training dynamics -- at the center of analysis. We systematically examine the effects of different visual encoder types, pretraining objectives, and fine-tuning strategies on downstream FSIC performance. Our experiments show that the training success and the out-of-domain performance are highly dependent on how the embedding models are pretrained. Consequently, PictSure manages to outperform existing ICL-based FSIC models on out-of-domain benchmarks that differ significantly from the training distribution, while maintaining comparable results on in-domain tasks. Code can be found at https://github.com/PictSure/pictsure-library.

PictSure: インコンテキスト学習画像分類器における事前学習埋め込みの重要性

PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers

要旨

Support