インコンテキスト・マルチインスタンス学習

要旨

マルチプルインスタンス学習（MIL）は、インスタンスの集合（バッグ）単位で教師信号が利用可能な問題を扱い、計算病理学から衛星画像に至るまで幅広い分野で成功を収めている。しかしながら、多くの実世界のアプリケーションに特徴的な低ラベル環境では、既存のアルゴリズムは苦戦している。柔軟なモデルは過学習し、硬直的なモデルは目の前のタスクに適応できない。本稿では、Perceiverスタイルのアーキテクチャを持つインコンテクスト学習器を合成データで事前学習することで、少数のラベル付きバッグから新しいタスクを解くことができるモデルが得られることを示す。推論時には、分類は単一の順伝播で行われ、勾配更新を必要としない。我々は、バッグ構造データのための様々な合成データ生成器を提案し調査し、それらが相補的な帰納的バイアスを捉えていることを発見した。これらの生成器の混合で事前学習されたモデルは、各タスクごとの強みを受け継ぎ、12のMILベンチマークにおいて最高の平均性能を達成し、タスク固有の学習を必要とする教師ありベースラインを凌駕する。

English

Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.