QG-MIL: Een Gated Transformer Aggregator voor Domein-Agnostisch Multiple Instance Learning in Medische Beeldvorming

Samenvatting

Op aandachtsgebaseerde Multiple Instance Learning-aggregatoren in medische beeldvorming is de neiging tot aandachtsconcentratie, wat leidt tot te zelfverzekerde en onstabiele voorspellingen. Wij introduceren QG-MIL, een gepoorte transformer-aggregator die dit probleem aanpakt via vier synergistische architecturale componenten: RMSNorm-gebaseerde pre-normalisatie, per-hoofd QK-normalisatie, fijnmazige aandachtuitgangspoortvorming en SwiGLU-stijl feed-forward modules. Samen stabiliseren deze ontwerpkeuzes de training en verdelen ze de aandacht gelijkmatiger over de instanties, zonder extra verliezen, masking of meertrapsregularisatie. We evalueren QG-MIL op zes benchmarks die de gehele-dia-pathologie en celniveau-hematologie omvatten, twee fundamenteel verschillende MIL-schalen. De best presterende QG-MIL-varianten overtreffen de belangrijkste baselines op alle zes benchmarks, met een gemiddelde verbetering van +6,1 macro-F1-punten. Aandachtsoverlays en aandachtsmassanalyse bevestigen een meer gedistribueerde instantieweging. Ablatiestudies tonen aan dat individuele componenten op specifieke datasets weliswaar de prestaties van het volledige model kunnen evenaren, maar dat het QG-MIL-ontwerp de meest consistente cross-domeinprestaties en de strakste variantie biedt in vergelijking met geselecteerde baselines. We stellen een configureerbare implementatie beschikbaar ter ondersteuning van reproduceerbaarheid op: https://github.com/unica-visual-intelligence-lab/QG-MIL

English

Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. Together, these design choices stabilize training and distribute attention more uniformly across instances without auxiliary losses, masking, or multi-stage regularization. We evaluate QG-MIL across six benchmarks spanning whole-slide pathology and cell-level hematology, covering two fundamentally different MIL scales. The best-performing QG-MIL variants outperform leading baselines on all six benchmarks, with an average improvement of +6.1 mean macro F1 points. Attention overlays and attention mass analysis confirm more distributed instance weighting. Ablation studies show that while individual components can match the full model on specific datasets, the QG-MIL design provides the most consistent cross-domain performance and tightest variance when compared to selected baselines. We release a configurable implementation to support reproducibility at: https://github.com/unica-visual-intelligence-lab/QG-MIL