建模稀疏突发性漏洞观测:数据约束下的预测分析
Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints
April 17, 2026
作者: Cedric Bonhomme, Alexandre Dulaunoy
cs.AI
摘要
理解和預測與漏洞相關的活動是網路威脅情報領域的重大挑戰。本研究旨在探討漏洞觀測事件(例如概念驗證發布、檢測模板或線上討論)是否可實現時序預測。基於我們早期關於VLAI的研究(該基於Transformer的模型能從文本描述預測漏洞嚴重性),我們檢驗嚴重性評分作為外生變量是否能改善時間序列預測效果。我們針對單個漏洞的短期觀測量預測評估了多種方法:首先測試包含/不包含log(x+1)變換及VLAI衍生嚴重性輸入的SARIMAX模型。儘管這些調整帶來有限改善,但SARIMAX模型仍難以適應稀疏、短時序且爆發性強的漏洞數據——實際預測常產生過寬的置信區間,有時甚至出現不合理的負值。為更好捕捉觀測事件的離散性與事件驅動特性,我們轉而探索如泊松回歸等計數型方法。初步結果顯示這類模型能產生更穩定且可解釋的預測,尤其在採用周聚合觀測量時。我們還討論了更簡便的實務替代方案,包括針對短期預測視窗的指數衰減函數,可在無需長期歷史序列的情況下估算未來活動。整體而言,本研究既揭示了預測罕見爆發型網路事件的潛力與局限,也為將預測分析整合至漏洞情報工作流程提供了實務指引。
English
Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whether vulnerability sightings, such as proof-of-concept releases, detection templates, or online discussions, can be forecast over time. Building on our earlier work on VLAI, a transformer-based model that predicts vulnerability severity from textual descriptions, we examine whether severity scores can improve time-series forecasting as exogenous variables. We evaluate several approaches for short-term forecasting of sightings per vulnerability. First, we test SARIMAX models with and without log(x+1) transformations and VLAI-derived severity inputs. Although these adjustments provide limited improvements, SARIMAX remains poorly suited to sparse, short, and bursty vulnerability data. In practice, forecasts often produce overly wide confidence intervals and sometimes unrealistic negative values. To better capture the discrete and event-driven nature of sightings, we then explore count-based methods such as Poisson regression. Early results show that these models produce more stable and interpretable forecasts, especially when sightings are aggregated weekly. We also discuss simpler operational alternatives, including exponential decay functions for short forecasting horizons, to estimate future activity without requiring long historical series. Overall, this study highlights both the potential and the limitations of forecasting rare and bursty cyber events, and provides practical guidance for integrating predictive analytics into vulnerability intelligence workflows.