맥락 내 다중 인스턴스 학습

초록

다중 인스턴스 학습(MIL)은 인스턴스 묶음(bag) 수준에서 감독이 제공되는 문제를 다루며, 계산 병리학에서 위성 이미지에 이르기까지 다양한 분야에서 성공적으로 적용되어 왔다. 그러나 기존 알고리즘은 많은 실제 응용 분야를 특징짓는 낮은 레이블 환경에서 어려움을 겪는다. 유연한 모델은 과적합되고, 경직된 모델은 당면한 과제에 적응하지 못한다. 본 연구에서는 합성 데이터에 대해 퍼시버(Perceiver) 스타일 아키텍처를 가진 맥락 내 학습자(in-context learner)를 사전 학습함으로써, 소수의 레이블이 지정된 묶음으로부터 새로운 과제를 해결할 수 있는 모델을 얻을 수 있음을 보여준다. 추론 시에는 단일 순방향 패스로 분류가 이루어지며 경사도 업데이트가 필요하지 않다. 우리는 묶음 구조 데이터를 위한 다양한 합성 데이터 생성기를 제안하고 조사하며, 이들이 상호 보완적인 귀납적 편향을 포착함을 발견한다. 이러한 생성기들의 혼합물로 사전 학습된 모델은 각 과제별 강점을 계승하며, 12개의 MIL 벤치마크에서 평균 성능이 가장 우수하여 과제별 학습이 필요한 지도 학습 기준선을 능가한다.

English

Multiple Instance Learning (MIL) addresses problems where supervision is available at the level of bags of instances and has been successfully applied in fields ranging from computational pathology to satellite imagery. Nevertheless, existing algorithms struggle in the low-label regime that characterizes many real-world applications. Flexible models overfit and rigid ones fail to adapt to the task at hand. We show that pretraining an in-context learner with a Perceiver-style architecture on synthetic data yields a model that can solve new tasks from a handful of labeled bags. At inference time, classification happens in a single forward pass and requires no gradient updates. We propose and investigate different synthetic data generators for bag-structured data and find that they capture complementary inductive biases. A model pretrained on a mixture of these generators inherits their per-task strengths and achieves the best average performance across twelve MIL benchmarks, outperforming supervised baselines that require task-specific training.