基于Transformer的代码编辑时漏洞检测: 零次、少次或微调?
Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?
May 23, 2023
作者: Aaron Chan, Anant Kharkar, Roshanak Zilouchian Moghaddam, Yevhen Mohylevskyy, Alec Helyar, Eslam Kamal, Mohamed Elkamhawy, Neel Sundaresan
cs.AI
摘要
软件漏洞给企业带来了重大成本。尽管在软件漏洞检测方法的研究和开发方面进行了大量努力,但仍然存在未被发现的漏洞,继续使软件所有者和用户面临风险。许多当前的漏洞检测方法要求在尝试检测之前,代码片段必须能够编译和构建。不幸的是,这会在漏洞注入和移除之间引入很长的延迟,这可能会大大增加修复漏洞的成本。我们认识到,当前机器学习的进展可以用于检测在EditTime编写代码时的语法不完整的代码片段中的易受攻击的代码模式。在本文中,我们提出了一个实用系统,利用大规模易受攻击的代码模式数据集上的深度学习,学习超过250种漏洞类型的复杂表现,并在EditTime检测易受攻击的代码模式。我们讨论了在最先进的预训练大型语言模型(LLMs)上的零样本、少样本和微调方法。我们展示了与最先进的漏洞检测模型相比,我们的方法将最先进水平提高了10%。我们还评估了我们的方法在代码LLMs中检测自动生成的代码中的漏洞。在一组高风险代码场景的基准测试中,我们的方法显示漏洞减少高达90%。
English
Software vulnerabilities bear enterprises significant costs. Despite
extensive efforts in research and development of software vulnerability
detection methods, uncaught vulnerabilities continue to put software owners and
users at risk. Many current vulnerability detection methods require that code
snippets can compile and build before attempting detection. This,
unfortunately, introduces a long latency between the time a vulnerability is
injected to the time it is removed, which can substantially increases the cost
of fixing a vulnerability. We recognize that the current advances in machine
learning can be used to detect vulnerable code patterns on syntactically
incomplete code snippets as the developer is writing the code at EditTime. In
this paper we present a practical system that leverages deep learning on a
large-scale data set of vulnerable code patterns to learn complex
manifestations of more than 250 vulnerability types and detect vulnerable code
patterns at EditTime. We discuss zero-shot, few-shot, and fine-tuning
approaches on state of the art pre-trained Large Language Models (LLMs). We
show that in comparison with state of the art vulnerability detection models
our approach improves the state of the art by 10%. We also evaluate our
approach to detect vulnerability in auto-generated code by code LLMs.
Evaluation on a benchmark of high-risk code scenarios shows a reduction of up
to 90% vulnerability reduction.