文脈対応メタ学習

要旨

ChatGPTのような大規模言語モデルは、ファインチューニングなしで推論中に新しい概念を学習する驚異的な能力を示します。しかし、推論中に新しい物体を検出するように訓練された視覚モデルは、この能力を再現することができず、代わりに性能が低いか、類似の物体に対するメタ学習やファインチューニングを必要とします。本研究では、ファインチューニングなしで推論中に新しい視覚概念を学習することで、大規模言語モデルを模倣するメタ学習アルゴリズムを提案します。我々のアプローチは、凍結された事前学習済み特徴抽出器を活用し、コンテキスト内学習と同様に、既知のラベルを持つデータポイントと未知のラベルを持つテストデータポイントに対するシーケンスモデリングとしてメタ学習を再構築します。11のメタ学習ベンチマークのうち8つにおいて、我々のアプローチは、メタ学習やファインチューニングなしで、これらのベンチマークでメタ学習された最先端のアルゴリズムP>M>Fを上回るか、同等の性能を達成しました。

English

Large Language Models like ChatGPT demonstrate a remarkable capacity to learn new concepts during inference without any fine-tuning. However, visual models trained to detect new objects during inference have been unable to replicate this ability, and instead either perform poorly or require meta-training and/or fine-tuning on similar objects. In this work, we propose a meta-learning algorithm that emulates Large Language Models by learning new visual concepts during inference without fine-tuning. Our approach leverages a frozen pre-trained feature extractor, and analogous to in-context learning, recasts meta-learning as sequence modeling over datapoints with known labels and a test datapoint with an unknown label. On 8 out of 11 meta-learning benchmarks, our approach -- without meta-training or fine-tuning -- exceeds or matches the state-of-the-art algorithm, P>M>F, which is meta-trained on these benchmarks.

文脈対応メタ学習

Context-Aware Meta-Learning

要旨

Support