QG-MIL: 一种用于医学影像中领域无关的多实例学习的门控Transformer聚合器

摘要

基于注意力的多实例学习聚合器在医学影像中容易出现注意力集中现象，从而导致过于自信且不稳定的预测。我们提出QG-MIL，这是一种门控Transformer聚合器，通过四个协同架构组件解决了该问题：基于RMSNorm的预归一化、逐头QK归一化、细粒度注意力输出门控以及SwiGLU风格的前馈模块。这些设计选择共同稳定了训练过程，并使得注意力在实例间分布更加均匀，无需辅助损失、掩码或多阶段正则化。我们在涵盖全切片病理学和细胞级血液学的六个基准数据集上对QG-MIL进行了评估，涉及两种根本不同尺度的MIL任务。性能最佳的QG-MIL变体在所有六个基准测试中均优于领先基线方法，平均宏F1分数提升+6.1个点。注意力叠加图和注意力质量分析证实了实例权重分布更加均匀。消融研究表明，虽然在特定数据集上个别组件可与完整模型匹敌，但与所选基线方法相比，QG-MIL设计提供了最一致的跨域性能和最小的方差。我们发布了可配置的实现以支持可复现性，地址为：https://github.com/unica-visual-intelligence-lab/QG-MIL

English

Attention-based Multiple Instance Learning aggregators in medical imaging are prone to attention concentration, producing overconfident and unstable predictions. We introduce QG-MIL, a gated transformer aggregator that addresses this through four synergistic architectural components: RMSNorm-based pre-normalization, per-head QK normalization, fine-grained attention output gating, and SwiGLU-style feed-forward modules. Together, these design choices stabilize training and distribute attention more uniformly across instances without auxiliary losses, masking, or multi-stage regularization. We evaluate QG-MIL across six benchmarks spanning whole-slide pathology and cell-level hematology, covering two fundamentally different MIL scales. The best-performing QG-MIL variants outperform leading baselines on all six benchmarks, with an average improvement of +6.1 mean macro F1 points. Attention overlays and attention mass analysis confirm more distributed instance weighting. Ablation studies show that while individual components can match the full model on specific datasets, the QG-MIL design provides the most consistent cross-domain performance and tightest variance when compared to selected baselines. We release a configurable implementation to support reproducibility at: https://github.com/unica-visual-intelligence-lab/QG-MIL