ChatPaper.aiChatPaper

AlphaOPT:基于自改进大语言模型经验库的优化程序构建

AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

October 21, 2025
作者: Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Hai Wang, Cathy Wu, Jinhua Zhao
cs.AI

摘要

优化建模在各行业中支持关键决策,但其自动化仍面临挑战:需将非正式语言映射为精确的数学表达与可执行的求解器代码。以往的大型语言模型(LLM)方法要么依赖脆弱的提示机制,要么需进行成本高昂且泛化能力有限的再训练。我们提出AlphaOPT,一种自我提升的经验库,使LLM能够从有限的演示(甚至仅凭答案,无需黄金标准程序)及求解器反馈中学习——无需标注的推理轨迹或参数更新。AlphaOPT采用持续的双阶段循环运作:(i) 库学习阶段,反思失败尝试,提取经求解器验证的结构化洞见,形成{分类、条件、解释、示例};(ii) 库进化阶段,诊断检索偏差,优化存储洞见的适用条件,提升跨任务迁移能力。此设计具备三大优势:(1) 无需精心策划的推理过程,即可高效地从有限演示中学习;(2) 通过更新库而非模型权重,实现持续扩展,避免昂贵的再训练;(3) 使知识显式且可解释,便于人类审查与干预。实验表明,AlphaOPT随数据量增加稳步提升(从100到300个训练项,准确率由65%升至72%),在仅基于答案训练的情况下,于分布外数据集OptiBench上超越最强基线7.7%。代码与数据公开于:https://github.com/Minw913/AlphaOPT。
English
Optimization modeling enables critical decisions across industries but remains difficult to automate: informal language must be mapped to precise mathematical formulations and executable solver code. Prior LLM approaches either rely on brittle prompting or costly retraining with limited generalization. We present AlphaOPT, a self-improving experience library that enables an LLM to learn from limited demonstrations (even answers alone, without gold-standard programs) and solver feedback - without annotated reasoning traces or parameter updates. AlphaOPT operates in a continual two-phase cycle: (i) a Library Learning phase that reflects on failed attempts, extracting solver-verified, structured insights as {taxonomy, condition, explanation, example}; and (ii) a Library Evolution phase that diagnoses retrieval misalignments and refines the applicability conditions of stored insights, improving transfer across tasks. This design (1) learns efficiently from limited demonstrations without curated rationales, (2) expands continually without costly retraining by updating the library rather than model weights, and (3) makes knowledge explicit and interpretable for human inspection and intervention. Experiments show that AlphaOPT steadily improves with more data (65% to 72% from 100 to 300 training items) and surpasses the strongest baseline by 7.7% on the out-of-distribution OptiBench dataset when trained only on answers. Code and data are available at: https://github.com/Minw913/AlphaOPT.
PDF31October 23, 2025