上下文多實例學習

摘要

多实例学习（MIL）解决的是监督信号仅存在于实例包级别的问题，并已成功应用于从计算病理学到卫星图像等多个领域。然而，现有算法在现实应用中常见的低标签样本场景下表现欠佳：灵活的模型容易过拟合，而刚性模型又难以适应特定任务。我们提出，在合成数据上使用感知器风格架构预训练一个上下文学习者，所得模型仅需少量标注实例包即可解决新任务。推理时，分类过程仅需一次前向传播，无需梯度更新。我们设计并研究了多种针对包结构数据的合成数据生成器，发现它们能够捕获互补的归纳偏置。混合使用这些生成器进行预训练的模型，继承了各生成器在特定任务上的优势，在十二个MIL基准测试中取得了平均最佳表现，超越了需要任务特定训练的有监督基线方法。

English

Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.