上下文多实例学习

摘要

多实例学习（Multiple Instance Learning, MIL）解决的是以包（bag）为单位的监督问题，并在计算病理学、卫星图像等领域取得了成功应用。然而，现有算法在标注数据稀缺的实际场景中表现不佳——灵活的模型容易过拟合，而僵化的模型又难以适应具体任务。我们证明，在合成数据上使用Perceiver架构预训练一个上下文学习器，即可得到一个能够通过少量标注包解决新任务的模型。推理时，分类只需一次前向传播，无需梯度更新。我们针对包结构数据提出了多种合成数据生成器并研究了其特性，发现它们捕捉到的归纳偏差具有互补性。在多种生成器混合数据上预训练的模型继承了各生成器在不同任务上的优势，在12个MIL基准测试中取得了平均最佳性能，超越了需要任务特定训练的监督基线模型。

English

Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.