FuzzCoder: 대규모 언어 모델 기반 바이트 단위 퍼징 테스트

초록

퍼징은 복잡한 소프트웨어의 취약점을 발견하기 위해 설계된 중요한 동적 프로그램 분석 기법입니다. 퍼징은 타겟 프로그램에 악성 입력을 인가하여 크래시, 버퍼 오버플로우, 메모리 오류, 예외 등을 유발하는 것을 포함합니다. 효율적인 방식으로 악성 입력을 생성하는 것은 해결되지 않은 어려운 문제이며, 가장 효과적인 접근법은 대개 기존의 유효한 입력에 균일한 무작위 변이를 적용하는 것입니다. 본 연구에서는 성공적인 공격 사례에서 입력 파일의 패턴을 학습하여 향후 퍼징 탐색을 안내하도록 미세 조정된 대규모 언어 모델(FuzzCoder)을 도입하고자 합니다. 구체적으로, 코드 LLM을 활용하여 퍼징 과정에서 입력의 변이 과정을 안내하는 프레임워크를 개발합니다. 변이 과정은 시퀀스-투-시퀀스 모델링으로 공식화되며, 여기서 LLM은 바이트 시퀀스를 입력받아 변이된 바이트 시퀀스를 출력합니다. FuzzCoder는 휴리스틱 퍼징 도구에서 수집된 성공적인 퍼징 기록으로 구성된 명령어 데이터셋(Fuzz-Instruct)으로 미세 조정됩니다. FuzzCoder는 프로그램의 비정상적인 동작을 유발하기 위해 입력 파일에서 변이 위치와 전략 위치를 예측할 수 있습니다. 실험 결과, AFL(American Fuzzy Lop) 기반 FuzzCoder는 ELF, JPG, MP3, XML 등 다양한 입력 형식에 대해 효과적 변이 비율(EPM)과 크래시 수(NC) 측면에서 상당한 향상을 보였습니다.

English

Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to pre-existing valid inputs. In this work, we propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations. Specifically, we develop a framework to leverage the code LLMs to guide the mutation process of inputs in fuzzing. The mutation process is formulated as the sequence-to-sequence modeling, where LLM receives a sequence of bytes and then outputs the mutated byte sequence. FuzzCoder is fine-tuned on the created instruction dataset (Fuzz-Instruct), where the successful fuzzing history is collected from the heuristic fuzzing tool. FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program. Experimental results show that FuzzCoder based on AFL (American Fuzzy Lop) gain significant improvements in terms of effective proportion of mutation (EPM) and number of crashes (NC) for various input formats including ELF, JPG, MP3, and XML.