LLMalMorph: 대규모 언어 모델을 이용한 변종 악성코드 생성의 실현 가능성에 관한 연구

초록

대형 언어 모델(LLM)은 소프트웨어 개발과 자동화된 코드 생성을 혁신적으로 변화시켰다. 이러한 발전에 동기를 받아, 본 논문은 악성코드 소스 코드를 수정하여 변종을 생성하는 데 LLM의 활용 가능성을 탐구한다. 우리는 LLMalMorph라는 반자동화 프레임워크를 소개하며, 이는 LLM의 의미론적 및 구문론적 코드 이해를 활용하여 새로운 악성코드 변종을 생성한다. LLMalMorph는 악성코드 소스 코드에서 함수 수준의 정보를 추출하고, 맞춤형으로 설계된 프롬프트와 전략적으로 정의된 코드 변환을 결합하여 리소스 집약적인 미세 조정 없이도 LLM이 변종을 생성하도록 유도한다. LLMalMorph를 평가하기 위해, 우리는 다양한 유형, 복잡성 및 기능을 가진 10개의 Windows 악성코드 샘플을 수집하고 618개의 변종을 생성했다. 우리의 철저한 실험은 이러한 악성코드 변종의 탐지율을 어느 정도 감소시키면서도 악성코드의 기능을 보존할 수 있음을 입증한다. 또한, 기계 학습(ML) 기반 악성코드 탐지기에 대해 최적화하지 않았음에도 불구하고, 여러 변종이 ML 기반 악성코드 분류기에 대해 주목할 만한 공격 성공률을 달성했다. 우리는 또한 소스 코드에서 악성코드 변종을 생성하는 데 있어 현재 LLM의 한계를 논의하고, 이 신흥 기술이 악성코드 변종 생성의 더 넓은 맥락에서 어디에 위치하는지 평가한다.

English

Large Language Models (LLMs) have transformed software development and automated code generation. Motivated by these advancements, this paper explores the feasibility of LLMs in modifying malware source code to generate variants. We introduce LLMalMorph, a semi-automated framework that leverages semantical and syntactical code comprehension by LLMs to generate new malware variants. LLMalMorph extracts function-level information from the malware source code and employs custom-engineered prompts coupled with strategically defined code transformations to guide the LLM in generating variants without resource-intensive fine-tuning. To evaluate LLMalMorph, we collected 10 diverse Windows malware samples of varying types, complexity and functionality and generated 618 variants. Our thorough experiments demonstrate that it is possible to reduce the detection rates of antivirus engines of these malware variants to some extent while preserving malware functionalities. In addition, despite not optimizing against any Machine Learning (ML)-based malware detectors, several variants also achieved notable attack success rates against an ML-based malware classifier. We also discuss the limitations of current LLM capabilities in generating malware variants from source code and assess where this emerging technology stands in the broader context of malware variant generation.

LLMalMorph: 대규모 언어 모델을 이용한 변종 악성코드 생성의 실현 가능성에 관한 연구

LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models

초록

Support