SAGA：一種用於多時域機率預測且結合自適應時間共形預測的序列自適應生成架構

摘要

財政部與中央銀行所使用的微觀模擬模型，依賴於僅捕捉條件分配第一及第二動差、且無法體現長期非線性結構的參數化終身收入過程。我們提出SAGA——一種專為不規則表格面板序列設計的僅解碼器變換器模型，並搭配分割共形校準包裝器，可提供具有有限樣本邊際覆蓋保證的個體層級預測區間。該模型以1990年至2022年間瑞典LISA縱向登記資料（包含2,143,817名個體及61,284,903人年）進行訓練，可預測未來一至三十年的年度勞動收入，並透過蒙地卡羅方法將其加總為現值折現後的終身收入分配。相較於Guvenen、Karahan、Ozkan及Song所提出的典型參數化過程，以及表格型與遞歸型基準模型，SAGA在十年期預測中將連續排名概率得分降低31.9%，在二十年期預測中將平均絕對誤差降低37.7%。共形區間的邊際名義覆蓋率誤差在0.4個百分點以內，在最差情況的人口統計亞群中則在2.4個百分點以內。重建的終身收入基尼係數為0.327，而部分觀測的真實值為0.341，GKOS估計值為0.378。模型權重、校準表格及合成等效數據集均已釋出，以便在受保護的SCB MONA環境之外進行複製。

English

Microsimulation models used by ministries of finance and central banks rely on parametric processes for lifetime earnings that capture only first and second moments of the conditional distribution and miss long-range nonlinear structure. We propose SAGA, a decoder-only transformer for irregular tabular panel sequences, paired with a split conformal calibration wrapper that delivers individual-level prediction intervals with finite-sample marginal coverage guarantees. Trained on the longitudinal Swedish LISA register over 1990 to 2022, comprising 2,143,817 individuals and 61,284,903 person-years, the model forecasts annual labor earnings at horizons of one to thirty years and aggregates them by Monte Carlo into present-discounted lifetime earnings distributions. Against the canonical Guvenen, Karahan, Ozkan, and Song parametric process and tabular and recurrent baselines, SAGA reduces continuous ranked probability score by 31.9 percent at the ten-year horizon and mean absolute error by 37.7 percent at the twenty-year horizon. Conformal intervals achieve nominal coverage to within 0.4 percentage points marginally and within 2.4 percentage points on the worst-case demographic subgroup. The reconstructed lifetime earnings Gini coefficient is 0.327 against the partially observed truth of 0.341 and the GKOS estimate of 0.378. Model weights, calibration tables, and a synthetic equivalent dataset are released for replication outside the protected SCB MONA environment.