AlphaOPT：基於自我改進大型語言模型經驗庫的優化程序構建

摘要

優化建模在各行業中支持關鍵決策，但其自動化仍具挑戰：需將非正式語言轉化為精確的數學表達式及可執行的求解器代碼。先前的大型語言模型（LLM）方法，要么依賴於脆弱的提示機制，要么需要成本高昂的再訓練且泛化能力有限。我們提出了AlphaOPT，這是一個自我提升的經驗庫，它使LLM能夠從有限的示範（僅答案即可，無需黃金標準程序）和求解器反饋中學習——無需註釋的推理軌跡或參數更新。AlphaOPT在一個持續的兩階段循環中運作：(i) 庫學習階段，反思失敗嘗試，提取求解器驗證的結構化洞見，形成{分類、條件、解釋、示例}；(ii) 庫進化階段，診斷檢索偏差並精煉存儲洞見的適用條件，提升跨任務的遷移能力。此設計(1) 無需精心策劃的推理過程，即可從有限示範中高效學習，(2) 通過更新庫而非模型權重，實現持續擴展而無需昂貴的再訓練，(3) 使知識顯式且可解釋，便於人類審查與干預。實驗表明，AlphaOPT隨數據量增加穩步提升（從100到300個訓練項，準確率從65%升至72%），在僅基於答案訓練的情況下，於分佈外數據集OptiBench上超越最強基線7.7%。代碼與數據見：https://github.com/Minw913/AlphaOPT。

English

Optimization modeling enables critical decisions across industries but remains difficult to automate: informal language must be mapped to precise mathematical formulations and executable solver code. Prior LLM approaches either rely on brittle prompting or costly retraining with limited generalization. We present AlphaOPT, a self-improving experience library that enables an LLM to learn from limited demonstrations (even answers alone, without gold-standard programs) and solver feedback - without annotated reasoning traces or parameter updates. AlphaOPT operates in a continual two-phase cycle: (i) a Library Learning phase that reflects on failed attempts, extracting solver-verified, structured insights as {taxonomy, condition, explanation, example}; and (ii) a Library Evolution phase that diagnoses retrieval misalignments and refines the applicability conditions of stored insights, improving transfer across tasks. This design (1) learns efficiently from limited demonstrations without curated rationales, (2) expands continually without costly retraining by updating the library rather than model weights, and (3) makes knowledge explicit and interpretable for human inspection and intervention. Experiments show that AlphaOPT steadily improves with more data (65% to 72% from 100 to 300 training items) and surpasses the strongest baseline by 7.7% on the out-of-distribution OptiBench dataset when trained only on answers. Code and data are available at: https://github.com/Minw913/AlphaOPT.

AlphaOPT：基於自我改進大型語言模型經驗庫的優化程序構建

AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

摘要

Support