Favia: 취약점 패치 식별 및 분석을 위한 포렌식 에이전트

초록

공개된 CVE에 대응하는 취약점 수정 커밋을 식별하는 것은 안전한 소프트웨어 유지관리에 필수적이지만, 대규모 저장소에는 수백만 개의 커밋이 포함되어 있고 그중 극히 일부만 보안 문제를 해결하기 때문에 규모에 따른 어려움이 지속되고 있다. 기존의 자동화된 접근법(전통적인 머신러닝 기법과 최근의 대규모 언어 모델 기반 방법 포함)은 종종 정밀도와 재현율 간의 트레이드오프가 좋지 않은 문제를 겪는다. 무작위로 샘플링된 커밋에 대해 빈번히 평가되는 이러한 방법들은 실제 환경에서 후보 커밋이 이미 보안 관련성이 높고 매우 유사한 상황, 즉 실제 어려움을 상당히 과소평가하고 있음을 우리는 발견했다. 우리는 확장 가능한 후보 순위 지정과 심층적이고 반복적인 의미론적 추론을 결합한 취약점 수정 식별을 위한 포렌식 기반 에이전트 프레임워크인 Favia를 제안한다. Favia는 먼저 효율적인 순위 지정 단계를 사용하여 커밋 검색 공간을 좁힌다. 그런 다음 ReAct 기반 LLM 에이전트를 사용하여 각 커밋을 엄격하게 평가한다. 에이전트에 특수화된 도구와 함께 환경으로서의 커밋 전 저장소를 제공함으로써, 에이전트는 취약한 구성요소를 위치 특정하고 코드베이스를 탐색하며 코드 변경 사항과 취약점 근본 원인 간의 인과적 일치를 확립하려고 시도한다. 이러한 증거 기반 프로세스는 단일 패스 또는 유사성 기반 방법으로는 파악하기 어려운 간접적, 다중 파일, 비트리비얼 수정 사항들을 강력하게 식별할 수 있게 한다. 우리는 3,708개의 실제 저장소에서 추출한 800만 개 이상의 커밋으로 구성된 대규모 데이터셋인 CVEVC에서 Favia를 평가하였으며, 현실적인 후보 선택 조건 하에서 이 방법이 최신 전통적 및 LLM 기반 베이스라인을 지속적으로 능가하며 가장 강력한 정밀도-재현율 트레이드오프와 최고의 F1-점수를 달성함을 보여준다.

English

Identifying vulnerability-fixing commits corresponding to disclosed CVEs is essential for secure software maintenance but remains challenging at scale, as large repositories contain millions of commits of which only a small fraction address security issues. Existing automated approaches, including traditional machine learning techniques and recent large language model (LLM)-based methods, often suffer from poor precision-recall trade-offs. Frequently evaluated on randomly sampled commits, we uncover that they are substantially underestimating real-world difficulty, where candidate commits are already security-relevant and highly similar. We propose Favia, a forensic, agent-based framework for vulnerability-fix identification that combines scalable candidate ranking with deep and iterative semantic reasoning. Favia first employs an efficient ranking stage to narrow the search space of commits. Each commit is then rigorously evaluated using a ReAct-based LLM agent. By providing the agent with a pre-commit repository as environment, along with specialized tools, the agent tries to localize vulnerable components, navigates the codebase, and establishes causal alignment between code changes and vulnerability root causes. This evidence-driven process enables robust identification of indirect, multi-file, and non-trivial fixes that elude single-pass or similarity-based methods. We evaluate Favia on CVEVC, a large-scale dataset we made that comprises over 8 million commits from 3,708 real-world repositories, and show that it consistently outperforms state-of-the-art traditional and LLM-based baselines under realistic candidate selection, achieving the strongest precision-recall trade-offs and highest F1-scores.

Favia: 취약점 패치 식별 및 분석을 위한 포렌식 에이전트

Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

초록

Support