LLMalMorph:基于大型语言模型生成变种恶意软件的可行性研究
LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models
July 12, 2025
作者: Md Ajwad Akil, Adrian Shuai Li, Imtiaz Karim, Arun Iyengar, Ashish Kundu, Vinny Parla, Elisa Bertino
cs.AI
摘要
大型语言模型(LLMs)已彻底改变了软件开发和自动化代码生成的领域。受这些进展的启发,本文探讨了利用LLMs修改恶意软件源代码以生成变种的可行性。我们引入了LLMalMorph,一个半自动化框架,它通过LLMs对代码的语义和句法理解来生成新的恶意软件变种。LLMalMorph从恶意软件源代码中提取函数级信息,并采用定制设计的提示与策略性定义的代码转换相结合,指导LLM生成变种,而无需进行资源密集型的微调。为了评估LLMalMorph,我们收集了10种类型、复杂性和功能各异的Windows恶意软件样本,并生成了618个变种。我们详尽的实验表明,可以在一定程度上降低这些恶意软件变种在反病毒引擎中的检测率,同时保持恶意软件的功能。此外,尽管未针对任何基于机器学习(ML)的恶意软件检测器进行优化,多个变种也在对抗基于ML的恶意软件分类器时取得了显著的攻击成功率。我们还讨论了当前LLM在从源代码生成恶意软件变种方面的能力限制,并评估了这一新兴技术在更广泛的恶意软件变种生成背景下的地位。
English
Large Language Models (LLMs) have transformed software development and
automated code generation. Motivated by these advancements, this paper explores
the feasibility of LLMs in modifying malware source code to generate variants.
We introduce LLMalMorph, a semi-automated framework that leverages semantical
and syntactical code comprehension by LLMs to generate new malware variants.
LLMalMorph extracts function-level information from the malware source code and
employs custom-engineered prompts coupled with strategically defined code
transformations to guide the LLM in generating variants without
resource-intensive fine-tuning. To evaluate LLMalMorph, we collected 10 diverse
Windows malware samples of varying types, complexity and functionality and
generated 618 variants. Our thorough experiments demonstrate that it is
possible to reduce the detection rates of antivirus engines of these malware
variants to some extent while preserving malware functionalities. In addition,
despite not optimizing against any Machine Learning (ML)-based malware
detectors, several variants also achieved notable attack success rates against
an ML-based malware classifier. We also discuss the limitations of current LLM
capabilities in generating malware variants from source code and assess where
this emerging technology stands in the broader context of malware variant
generation.