草垛寻针：基于反事实扰动的弱监督日志实例异常定位

摘要

日志异常检测是系统运维与安全保证的关键任务。然而，在大规模网络化系统中，日志数据以海量规模产生，而实例级标注成本极高，这给细粒度异常定位带来了巨大困难。为应对这一挑战，本文提出LogMILP（基于原型增强与扰动机制的多实例学习日志异常定位方法），这是一种仅需包级标签即可同时实现包级异常检测与实例级异常定位的弱监督框架。该方法通过原型引导的结构化建模与反事实扰动一致性正则化，指导模型精准定位关键日志条目，从而在粗粒度监督下提升定位可靠性与可解释性。在三个公开数据集上的实验结果表明，LogMILP在实现具有竞争力的检测性能的同时，能够显著提升实例级定位的可靠性。我们的开源代码已发布于 https://github.com/YUK1207/LogMILP。

English

Log anomaly detection is a critical task for system operations and security assurance. However, in networked systems at scale, log data are generated at massive scale while instance-level annotations are prohibitively expensive, posing great difficulties to fine-grained anomaly localization. To address this challenge, we propose LogMILP (Log anomaly localization based on Multi-Instance Learning enhanced by prototypes and Perturbation), a weakly supervised framework that enables both bag-level anomaly detection and instance-level anomaly localization using only bag-level labels. Our method guides the model to pinpoint the critical log entries using prototype-guided structural modeling with counterfactual perturbation consistency regularization, thereby improving localization reliability and interpretability under coarse-grained supervision. Experimental results on three public datasets demonstrate that LogMILP achieves competitive detection performance while yielding significantly more reliable instance-level localization. Our code is open-sourced at https://github.com/YUK1207/LogMILP.