建模稀疏突发性漏洞观测数据：在数据约束下的预测分析

摘要

理解和预测与漏洞相关的活动是网络安全威胁情报领域的重大挑战。本研究探讨了漏洞观测数据（如概念验证发布、检测规则模板或在线讨论）能否随时间推移进行预测。基于我们早前开发的VLAI模型——一种通过文本描述预测漏洞严重性的Transformer架构模型，我们重点检验了将严重性评分作为外生变量是否能提升时间序列预测效果。我们评估了多种针对单个漏洞短期观测频次预测的方法：首先测试了采用/未采用log(x+1)变换及VLAI严重性输入的SARIMAX模型，尽管这些调整带来的改进有限，但SARIMAX模型仍难以适应稀疏、短暂且突发性强的漏洞数据特性，其预测常产生过宽的置信区间甚至不合理的负值。为更好捕捉观测数据的离散性和事件驱动特征，我们继而探索了泊松回归等计数方法。初步结果表明，尤其在按周聚合观测数据时，这类模型能产生更稳定且可解释的预测结果。我们还讨论了更简化的操作方案，如针对短期预测窗口的指数衰减函数，可在缺乏长期历史序列的情况下估算未来活动水平。总体而言，本研究既揭示了预测罕见突发网络事件的潜力与局限，也为将预测分析融入漏洞情报工作流提供了实用指导。

English

Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whether vulnerability sightings, such as proof-of-concept releases, detection templates, or online discussions, can be forecast over time. Building on our earlier work on VLAI, a transformer-based model that predicts vulnerability severity from textual descriptions, we examine whether severity scores can improve time-series forecasting as exogenous variables. We evaluate several approaches for short-term forecasting of sightings per vulnerability. First, we test SARIMAX models with and without log(x+1) transformations and VLAI-derived severity inputs. Although these adjustments provide limited improvements, SARIMAX remains poorly suited to sparse, short, and bursty vulnerability data. In practice, forecasts often produce overly wide confidence intervals and sometimes unrealistic negative values. To better capture the discrete and event-driven nature of sightings, we then explore count-based methods such as Poisson regression. Early results show that these models produce more stable and interpretable forecasts, especially when sightings are aggregated weekly. We also discuss simpler operational alternatives, including exponential decay functions for short forecasting horizons, to estimate future activity without requiring long historical series. Overall, this study highlights both the potential and the limitations of forecasting rare and bursty cyber events, and provides practical guidance for integrating predictive analytics into vulnerability intelligence workflows.