技术RAG：面向网络威胁情报文本中对抗性技术标注的检索增强生成

摘要

在安全文本中准确识别对抗技术对于有效的网络防御至关重要。然而，现有方法面临一个根本性的权衡：它们要么依赖于领域精度有限的通用模型，要么需要资源密集型的处理流程，这些流程依赖于大量标注数据集和任务特定的优化，如自定义硬负样本挖掘和去噪，而这些资源在专业领域中往往难以获取。我们提出了TechniqueRAG，一个特定领域的检索增强生成（RAG）框架，通过整合现成的检索器、指令调优的大型语言模型（LLMs）以及少量的文本-技术对，弥合了这一差距。我们的方法通过在有限的领域内示例上仅微调生成组件，解决了数据稀缺问题，从而避免了资源密集型的检索训练需求。虽然传统的RAG通过结合检索和生成来缓解幻觉问题，但其对通用检索器的依赖常常引入噪声候选，限制了领域特定的精度。为了解决这一问题，我们通过零样本LLM重排序来提升检索质量和领域特异性，明确地将检索到的候选与对抗技术对齐。在多个安全基准测试上的实验表明，TechniqueRAG无需广泛的任务特定优化或标注数据即可实现最先进的性能，同时全面的分析提供了更深入的见解。

English

Accurately identifying adversarial techniques in security texts is critical for effective cyber defense. However, existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines that depend on large labeled datasets and task-specific optimizations, such as custom hard-negative mining and denoising, resources rarely available in specialized domains. We propose TechniqueRAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs. Our approach addresses data scarcity by fine-tuning only the generation component on limited in-domain examples, circumventing the need for resource-intensive retrieval training. While conventional RAG mitigates hallucination by coupling retrieval and generation, its reliance on generic retrievers often introduces noisy candidates, limiting domain-specific precision. To address this, we enhance retrieval quality and domain specificity through zero-shot LLM re-ranking, which explicitly aligns retrieved candidates with adversarial techniques. Experiments on multiple security benchmarks demonstrate that TechniqueRAG achieves state-of-the-art performance without extensive task-specific optimizations or labeled data, while comprehensive analysis provides further insights.

技术RAG：面向网络威胁情报文本中对抗性技术标注的检索增强生成

TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text

摘要

Support