跨领域评估基于Transformer的漏洞检测在开放与工业数据上的应用
Cross-Domain Evaluation of Transformer-Based Vulnerability Detection on Open & Industry Data
September 11, 2025
作者: Moritz Mock, Thomas Forrer, Barbara Russo
cs.AI
摘要
學術研究中提出的深度學習漏洞檢測方案,並非總能為開發者所便捷使用,且其在工業環境中的適用性鮮有探討。將此類技術從學術界轉移至工業界,面臨著可信度、遺留系統、數字素養有限以及學術與工業專業知識間差距等挑戰。特別是對於深度學習而言,性能及其與現有工作流程的整合更是額外的關注點。在本研究中,我們首先評估了CodeBERT在檢測工業及開源軟件中易受攻擊函數方面的性能。我們分析了其在開源數據上微調後對工業數據的跨領域泛化能力,反之亦然,並探討了處理類別不平衡的策略。基於這些結果,我們開發了AI-DO(開發者操作中的自動化漏洞檢測集成),這是一個集成於持續集成-持續部署(CI/CD)流程中的推薦系統,它利用微調後的CodeBERT在代碼審查期間檢測並定位漏洞,而不中斷工作流程。最後,我們通過對公司IT專業人員的調查,評估了該工具的感知有用性。我們的結果表明,基於工業數據訓練的模型在相同領域內能準確檢測漏洞,但在開原始碼上性能下降;而通過適當的欠採樣技術對開源數據進行微調的深度學習模型,則提升了漏洞檢測的效果。
English
Deep learning solutions for vulnerability detection proposed in academic
research are not always accessible to developers, and their applicability in
industrial settings is rarely addressed. Transferring such technologies from
academia to industry presents challenges related to trustworthiness, legacy
systems, limited digital literacy, and the gap between academic and industrial
expertise. For deep learning in particular, performance and integration into
existing workflows are additional concerns. In this work, we first evaluate the
performance of CodeBERT for detecting vulnerable functions in industrial and
open-source software. We analyse its cross-domain generalisation when
fine-tuned on open-source data and tested on industrial data, and vice versa,
also exploring strategies for handling class imbalance. Based on these results,
we develop AI-DO(Automating vulnerability detection Integration for Developers'
Operations), a Continuous Integration-Continuous Deployment (CI/CD)-integrated
recommender system that uses fine-tuned CodeBERT to detect and localise
vulnerabilities during code review without disrupting workflows. Finally, we
assess the tool's perceived usefulness through a survey with the company's IT
professionals. Our results show that models trained on industrial data detect
vulnerabilities accurately within the same domain but lose performance on
open-source code, while a deep learner fine-tuned on open data, with
appropriate undersampling techniques, improves the detection of
vulnerabilities.