ファビア: 脆弱性修正同定・分析のためのフォレンジックエージェント

要旨

開示されたCVEに対応する脆弱性修正コミットを特定することは、安全なソフトウェア保守において不可欠であるが、大規模リポジトリには数百万のコミットが含まれ、そのうちごく一部のみがセキュリティ問題に対処するため、規模に応じた対応は依然として困難である。従来の機械学習技術や最近の大規模言語モデル（LLM）ベースの手法を含む既存の自動アプローチは、精度と再現率のトレードオフが不十分であることが多い。ランダムにサンプリングされたコミットで頻繁に評価されるこれらの手法は、現実世界の難易度を大幅に過小評価していることが明らかになった。現実世界では、候補コミットは既にセキュリティ関連であり、非常に類似している。本論文では、脆弱性修正識別のための法科学的でエージェントベースのフレームワークであるFaviaを提案する。Faviaは、スケーラブルな候補ランキングと、深い反復的意味推論を組み合わせる。Faviaはまず、効率的なランキング段階を用いてコミットの検索空間を絞り込む。その後、ReActベースのLLMエージェントを用いて各コミットを厳密に評価する。専用のツールとともに、コミット前のリポジトリを環境としてエージェントに提供することで、エージェントは脆弱なコンポーネントの特定を試み、コードベースをナビゲートし、コード変更と脆弱性の根本原因との間の因果関係を確立する。この証拠駆動のプロセスにより、単一パスまたは類似性ベースの手法では見逃されがちな、間接的、複数ファイルにわたる、非自明な修正を頑健に識別できる。3,708の実世界リポジトリからなる800万以上のコミットを含む大規模データセットCVEVC上でFaviaを評価し、現実的な候補選択条件下で、従来の手法およびLLMベースのベースライン手法を一貫して凌駕し、最も強力な精度-再現率のトレードオフと最高のF1スコアを達成することを示す。

English

Identifying vulnerability-fixing commits corresponding to disclosed CVEs is essential for secure software maintenance but remains challenging at scale, as large repositories contain millions of commits of which only a small fraction address security issues. Existing automated approaches, including traditional machine learning techniques and recent large language model (LLM)-based methods, often suffer from poor precision-recall trade-offs. Frequently evaluated on randomly sampled commits, we uncover that they are substantially underestimating real-world difficulty, where candidate commits are already security-relevant and highly similar. We propose Favia, a forensic, agent-based framework for vulnerability-fix identification that combines scalable candidate ranking with deep and iterative semantic reasoning. Favia first employs an efficient ranking stage to narrow the search space of commits. Each commit is then rigorously evaluated using a ReAct-based LLM agent. By providing the agent with a pre-commit repository as environment, along with specialized tools, the agent tries to localize vulnerable components, navigates the codebase, and establishes causal alignment between code changes and vulnerability root causes. This evidence-driven process enables robust identification of indirect, multi-file, and non-trivial fixes that elude single-pass or similarity-based methods. We evaluate Favia on CVEVC, a large-scale dataset we made that comprises over 8 million commits from 3,708 real-world repositories, and show that it consistently outperforms state-of-the-art traditional and LLM-based baselines under realistic candidate selection, achieving the strongest precision-recall trade-offs and highest F1-scores.

ファビア: 脆弱性修正同定・分析のためのフォレンジックエージェント

Favia: Forensic Agent for Vulnerability-fix Identification and Analysis

要旨

Support