AlphaOPT:基於自我改進大型語言模型經驗庫的優化程序構建
AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library
October 21, 2025
作者: Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Hai Wang, Cathy Wu, Jinhua Zhao
cs.AI
摘要
優化建模在各行業中支持關鍵決策,但其自動化仍具挑戰:需將非正式語言轉化為精確的數學表達式及可執行的求解器代碼。先前的大型語言模型(LLM)方法,要么依賴於脆弱的提示機制,要么需要成本高昂的再訓練且泛化能力有限。我們提出了AlphaOPT,這是一個自我提升的經驗庫,它使LLM能夠從有限的示範(僅答案即可,無需黃金標準程序)和求解器反饋中學習——無需註釋的推理軌跡或參數更新。AlphaOPT在一個持續的兩階段循環中運作:(i) 庫學習階段,反思失敗嘗試,提取求解器驗證的結構化洞見,形成{分類、條件、解釋、示例};(ii) 庫進化階段,診斷檢索偏差並精煉存儲洞見的適用條件,提升跨任務的遷移能力。此設計(1) 無需精心策劃的推理過程,即可從有限示範中高效學習,(2) 通過更新庫而非模型權重,實現持續擴展而無需昂貴的再訓練,(3) 使知識顯式且可解釋,便於人類審查與干預。實驗表明,AlphaOPT隨數據量增加穩步提升(從100到300個訓練項,準確率從65%升至72%),在僅基於答案訓練的情況下,於分佈外數據集OptiBench上超越最強基線7.7%。代碼與數據見:https://github.com/Minw913/AlphaOPT。
English
Optimization modeling enables critical decisions across industries but
remains difficult to automate: informal language must be mapped to precise
mathematical formulations and executable solver code. Prior LLM approaches
either rely on brittle prompting or costly retraining with limited
generalization. We present AlphaOPT, a self-improving experience library that
enables an LLM to learn from limited demonstrations (even answers alone,
without gold-standard programs) and solver feedback - without annotated
reasoning traces or parameter updates. AlphaOPT operates in a continual
two-phase cycle: (i) a Library Learning phase that reflects on failed attempts,
extracting solver-verified, structured insights as {taxonomy, condition,
explanation, example}; and (ii) a Library Evolution phase that diagnoses
retrieval misalignments and refines the applicability conditions of stored
insights, improving transfer across tasks. This design (1) learns efficiently
from limited demonstrations without curated rationales, (2) expands continually
without costly retraining by updating the library rather than model weights,
and (3) makes knowledge explicit and interpretable for human inspection and
intervention. Experiments show that AlphaOPT steadily improves with more data
(65% to 72% from 100 to 300 training items) and surpasses the strongest
baseline by 7.7% on the out-of-distribution OptiBench dataset when trained only
on answers. Code and data are available at:
https://github.com/Minw913/AlphaOPT.