ChatPaper.aiChatPaper

技术RAG:面向网络威胁情报文本中对抗性技术标注的检索增强生成

TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text

May 17, 2025
作者: Ahmed Lekssays, Utsav Shukla, Husrev Taha Sencar, Md Rizwan Parvez
cs.AI

摘要

在安全文本中准确识别对抗技术对于有效的网络防御至关重要。然而,现有方法面临一个根本性的权衡:它们要么依赖于领域精度有限的通用模型,要么需要资源密集型的处理流程,这些流程依赖于大量标注数据集和任务特定的优化,如自定义硬负样本挖掘和去噪,而这些资源在专业领域中往往难以获取。 我们提出了TechniqueRAG,一个特定领域的检索增强生成(RAG)框架,通过整合现成的检索器、指令调优的大型语言模型(LLMs)以及少量的文本-技术对,弥合了这一差距。我们的方法通过在有限的领域内示例上仅微调生成组件,解决了数据稀缺问题,从而避免了资源密集型的检索训练需求。虽然传统的RAG通过结合检索和生成来缓解幻觉问题,但其对通用检索器的依赖常常引入噪声候选,限制了领域特定的精度。为了解决这一问题,我们通过零样本LLM重排序来提升检索质量和领域特异性,明确地将检索到的候选与对抗技术对齐。 在多个安全基准测试上的实验表明,TechniqueRAG无需广泛的任务特定优化或标注数据即可实现最先进的性能,同时全面的分析提供了更深入的见解。
English
Accurately identifying adversarial techniques in security texts is critical for effective cyber defense. However, existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines that depend on large labeled datasets and task-specific optimizations, such as custom hard-negative mining and denoising, resources rarely available in specialized domains. We propose TechniqueRAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs. Our approach addresses data scarcity by fine-tuning only the generation component on limited in-domain examples, circumventing the need for resource-intensive retrieval training. While conventional RAG mitigates hallucination by coupling retrieval and generation, its reliance on generic retrievers often introduces noisy candidates, limiting domain-specific precision. To address this, we enhance retrieval quality and domain specificity through zero-shot LLM re-ranking, which explicitly aligns retrieved candidates with adversarial techniques. Experiments on multiple security benchmarks demonstrate that TechniqueRAG achieves state-of-the-art performance without extensive task-specific optimizations or labeled data, while comprehensive analysis provides further insights.

Summary

AI-Generated Summary

PDF22May 20, 2025