GeneCIS: 一般条件付き画像類似性のベンチマーク

要旨

私たちは、「類似性」には多くの概念が存在し、モデルも人間と同様に、これらに動的に適応できるべきだと主張します。これは、固定された埋め込み関数を学習し、したがって単一の類似性概念を暗黙的に仮定する、教師ありまたは自己教師ありの表現学習手法の多くとは対照的です。例えば、ImageNetで訓練されたモデルは物体カテゴリーに偏っていますが、ユーザーは色、質感、またはシーン内の特定の要素に焦点を当てることを望むかもしれません。本論文では、モデルがさまざまな類似性条件に適応する能力を測定するGeneCIS（「genesis」）ベンチマークを提案します。先行研究を拡張し、このベンチマークはゼロショット評価のみを対象として設計されており、したがってオープンセットの類似性条件を考慮します。強力なCLIPモデルのベースラインはGeneCISで苦戦し、ベンチマークでの性能はImageNetの精度と弱い相関しか示さないことから、既存の手法を単純にスケールアップすることは有益でないことが示唆されます。さらに、既存の画像キャプションデータセットから自動的に情報をマイニングする、シンプルでスケーラブルなソリューションを提案します。私たちの手法は、GeneCISにおいてベースラインを大幅に上回り、関連する画像検索ベンチマークでのゼロショット性能もさらに向上させることがわかりました。実際、ゼロショットで評価されたにもかかわらず、私たちのモデルはMIT-Statesにおいて教師ありの最先端モデルを凌駕しています。プロジェクトページはhttps://sgvaze.github.io/genecis/にあります。

English

We argue that there are many notions of 'similarity' and that models, like humans, should be able to adapt to these dynamically. This contrasts with most representation learning methods, supervised or self-supervised, which learn a fixed embedding function and hence implicitly assume a single notion of similarity. For instance, models trained on ImageNet are biased towards object categories, while a user might prefer the model to focus on colors, textures or specific elements in the scene. In this paper, we propose the GeneCIS ('genesis') benchmark, which measures models' ability to adapt to a range of similarity conditions. Extending prior work, our benchmark is designed for zero-shot evaluation only, and hence considers an open-set of similarity conditions. We find that baselines from powerful CLIP models struggle on GeneCIS and that performance on the benchmark is only weakly correlated with ImageNet accuracy, suggesting that simply scaling existing methods is not fruitful. We further propose a simple, scalable solution based on automatically mining information from existing image-caption datasets. We find our method offers a substantial boost over the baselines on GeneCIS, and further improves zero-shot performance on related image retrieval benchmarks. In fact, though evaluated zero-shot, our model surpasses state-of-the-art supervised models on MIT-States. Project page at https://sgvaze.github.io/genecis/.

GeneCIS: 一般条件付き画像類似性のベンチマーク

GeneCIS: A Benchmark for General Conditional Image Similarity

要旨

Support