LLMalMorph:基于大语言模型生成变种恶意软件的可行性研究
LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models
July 12, 2025
作者: Md Ajwad Akil, Adrian Shuai Li, Imtiaz Karim, Arun Iyengar, Ashish Kundu, Vinny Parla, Elisa Bertino
cs.AI
摘要
大型语言模型(LLMs)已深刻改变了软件开发与自动化代码生成领域。受此进展启发,本文探讨了LLMs在修改恶意软件源代码以生成变种方面的可行性。我们提出了LLMalMorph,一个半自动化框架,它利用LLMs对代码语义和句法的理解能力来生成新的恶意软件变种。LLMalMorph从恶意软件源代码中提取函数级信息,并结合定制设计的提示与策略性定义的代码转换,引导LLM生成变种,而无需进行资源密集型的微调。为评估LLMalMorph,我们收集了10种类型、复杂度和功能各异的Windows恶意软件样本,并生成了618个变种。我们的详尽实验表明,在保持恶意软件功能的同时,能够在一定程度上降低这些变种对杀毒引擎的检测率。此外,尽管未针对任何基于机器学习(ML)的恶意软件检测器进行优化,多个变种仍在对基于ML的恶意软件分类器的攻击中取得了显著的成功率。我们还讨论了当前LLMs在从源代码生成恶意软件变种方面的局限性,并评估了这一新兴技术在更广泛的恶意软件变种生成背景下的现状。
English
Large Language Models (LLMs) have transformed software development and
automated code generation. Motivated by these advancements, this paper explores
the feasibility of LLMs in modifying malware source code to generate variants.
We introduce LLMalMorph, a semi-automated framework that leverages semantical
and syntactical code comprehension by LLMs to generate new malware variants.
LLMalMorph extracts function-level information from the malware source code and
employs custom-engineered prompts coupled with strategically defined code
transformations to guide the LLM in generating variants without
resource-intensive fine-tuning. To evaluate LLMalMorph, we collected 10 diverse
Windows malware samples of varying types, complexity and functionality and
generated 618 variants. Our thorough experiments demonstrate that it is
possible to reduce the detection rates of antivirus engines of these malware
variants to some extent while preserving malware functionalities. In addition,
despite not optimizing against any Machine Learning (ML)-based malware
detectors, several variants also achieved notable attack success rates against
an ML-based malware classifier. We also discuss the limitations of current LLM
capabilities in generating malware variants from source code and assess where
this emerging technology stands in the broader context of malware variant
generation.