ChatPaper.aiChatPaper

TechniqueRAG:針對網路威脅情報文本中對抗性技術的檢索增強生成註解

TechniqueRAG: Retrieval Augmented Generation for Adversarial Technique Annotation in Cyber Threat Intelligence Text

May 17, 2025
作者: Ahmed Lekssays, Utsav Shukla, Husrev Taha Sencar, Md Rizwan Parvez
cs.AI

摘要

在安全文本中精確識別對抗技術對於有效的網路防禦至關重要。然而,現有方法面臨一個根本性的權衡:它們要么依賴於具有有限領域精度的通用模型,要么需要資源密集型的處理流程,這些流程依賴於大量標記數據集和任務特定的優化,如自定義的硬負樣本挖掘和去噪,這些資源在專業領域中往往難以獲得。 我們提出了TechniqueRAG,這是一個特定領域的檢索增強生成(RAG)框架,通過整合現成的檢索器、指令調優的大型語言模型(LLM)以及少量的文本-技術對,來彌合這一差距。我們的方法通過僅在有限的領域內示例上微調生成組件,解決了數據稀缺的問題,從而避免了資源密集型的檢索訓練需求。雖然傳統的RAG通過結合檢索和生成來減輕幻覺問題,但其對通用檢索器的依賴往往會引入噪聲候選,限制了領域特定的精度。為了解決這一問題,我們通過零樣本LLM重新排序來提升檢索質量和領域特異性,這使得檢索到的候選與對抗技術明確對齊。 在多個安全基準測試上的實驗表明,TechniqueRAG在無需大量任務特定優化或標記數據的情況下,達到了最先進的性能,而全面的分析則提供了進一步的洞察。
English
Accurately identifying adversarial techniques in security texts is critical for effective cyber defense. However, existing methods face a fundamental trade-off: they either rely on generic models with limited domain precision or require resource-intensive pipelines that depend on large labeled datasets and task-specific optimizations, such as custom hard-negative mining and denoising, resources rarely available in specialized domains. We propose TechniqueRAG, a domain-specific retrieval-augmented generation (RAG) framework that bridges this gap by integrating off-the-shelf retrievers, instruction-tuned LLMs, and minimal text-technique pairs. Our approach addresses data scarcity by fine-tuning only the generation component on limited in-domain examples, circumventing the need for resource-intensive retrieval training. While conventional RAG mitigates hallucination by coupling retrieval and generation, its reliance on generic retrievers often introduces noisy candidates, limiting domain-specific precision. To address this, we enhance retrieval quality and domain specificity through zero-shot LLM re-ranking, which explicitly aligns retrieved candidates with adversarial techniques. Experiments on multiple security benchmarks demonstrate that TechniqueRAG achieves state-of-the-art performance without extensive task-specific optimizations or labeled data, while comprehensive analysis provides further insights.

Summary

AI-Generated Summary

PDF22May 20, 2025