預測不可預測之事:基於雙向長短期記憶網絡(BiLSTM)的全球恐怖主義數據庫(GTD)事件計數可重現性預測
Predicting the Unpredictable: Reproducible BiLSTM Forecasting of Incident Counts in the Global Terrorism Database (GTD)
October 16, 2025
作者: Oluwasegun Adegoke
cs.AI
摘要
我們利用全球恐怖主義數據庫(GTD,1970-2016)研究每週恐怖事件數量的短期預測。我們構建了一個可重現的流程,採用固定的時間劃分,並將雙向長短期記憶網絡(BiLSTM)與強力的經典基準(季節性樸素模型、線性/ARIMA模型)以及一個深度LSTM-注意力基線進行比較。在保留的測試集上,BiLSTM達到了6.38的均方根誤差(RMSE),優於LSTM-注意力模型(9.19;提升30.6%)和線性滯後回歸基線(RMSE提升35.4%),同時在平均絕對誤差(MAE)和平均絕對百分比誤差(MAPE)上也取得了並行的改進。通過對時間記憶、訓練歷史長度、空間粒度、回顧窗口大小及特徵組的消融實驗表明,基於長期歷史數據訓練的模型泛化能力最佳;適中的回顧窗口(20-30週)提供了強有力的上下文信息;而雙向編碼對於捕捉窗口內的積累與後續模式至關重要。特徵組分析指出,短期結構(滯後計數與滾動統計)貢獻最大,地理與傷亡特徵則帶來額外的提升。我們公開了代碼、配置及簡潔的結果表格,並提供了一份數據/倫理聲明,記錄了GTD的許可及僅限研究使用的規定。總體而言,本研究為GTD事件預測提供了一個透明且超越基線的參考框架。
English
We study short-horizon forecasting of weekly terrorism incident counts using
the Global Terrorism Database (GTD, 1970--2016). We build a reproducible
pipeline with fixed time-based splits and evaluate a Bidirectional LSTM
(BiLSTM) against strong classical anchors (seasonal-naive, linear/ARIMA) and a
deep LSTM-Attention baseline. On the held-out test set, the BiLSTM attains RMSE
6.38, outperforming LSTM-Attention (9.19; +30.6\%) and a linear lag-regression
baseline (+35.4\% RMSE gain), with parallel improvements in MAE and MAPE.
Ablations varying temporal memory, training-history length, spatial grain,
lookback size, and feature groups show that models trained on long historical
data generalize best; a moderate lookback (20--30 weeks) provides strong
context; and bidirectional encoding is critical for capturing both build-up and
aftermath patterns within the window. Feature-group analysis indicates that
short-horizon structure (lagged counts and rolling statistics) contributes
most, with geographic and casualty features adding incremental lift. We release
code, configs, and compact result tables, and provide a data/ethics statement
documenting GTD licensing and research-only use. Overall, the study offers a
transparent, baseline-beating reference for GTD incident forecasting.