AlphaOPT: 自己改善型LLM経験ライブラリを用いた最適化プログラムの定式化

要旨

最適化モデリングは産業界における重要な意思決定を可能にするが、その自動化は依然として困難である。非公式な言語を正確な数学的定式化と実行可能なソルバーコードにマッピングする必要がある。従来のLLMアプローチは、脆弱なプロンプトに依存するか、限定的な汎化能力しかない高コストな再学習を必要としていた。本論文では、AlphaOPTを紹介する。これは自己改善型の経験ライブラリであり、LLMが限られたデモンストレーション（ゴールドスタンダードのプログラムなしで、答えだけでも）とソルバーのフィードバックから学習することを可能にする。AlphaOPTは、注釈付きの推論トレースやパラメータ更新を必要としない。AlphaOPTは、継続的な2段階のサイクルで動作する：(i) 失敗した試みを反映し、ソルバーによって検証された構造化された洞察を{分類、条件、説明、例}として抽出する「ライブラリ学習フェーズ」と、(ii) 検索の不一致を診断し、保存された洞察の適用条件を洗練することで、タスク間の転移を改善する「ライブラリ進化フェーズ」である。この設計により、(1) キュレーションされた根拠なしに限られたデモンストレーションから効率的に学習し、(2) モデルの重みではなくライブラリを更新することで、高コストな再学習なしに継続的に拡張し、(3) 知識を明示的かつ解釈可能にし、人間による検査と介入を可能にする。実験結果は、AlphaOPTがより多くのデータで着実に改善し（100から300のトレーニング項目で65%から72%）、答えのみでトレーニングされた場合、OptiBenchデータセットの分布外データにおいて最も強力なベースラインを7.7%上回ることを示している。コードとデータは以下で利用可能である：https://github.com/Minw913/AlphaOPT。

English

Optimization modeling enables critical decisions across industries but remains difficult to automate: informal language must be mapped to precise mathematical formulations and executable solver code. Prior LLM approaches either rely on brittle prompting or costly retraining with limited generalization. We present AlphaOPT, a self-improving experience library that enables an LLM to learn from limited demonstrations (even answers alone, without gold-standard programs) and solver feedback - without annotated reasoning traces or parameter updates. AlphaOPT operates in a continual two-phase cycle: (i) a Library Learning phase that reflects on failed attempts, extracting solver-verified, structured insights as {taxonomy, condition, explanation, example}; and (ii) a Library Evolution phase that diagnoses retrieval misalignments and refines the applicability conditions of stored insights, improving transfer across tasks. This design (1) learns efficiently from limited demonstrations without curated rationales, (2) expands continually without costly retraining by updating the library rather than model weights, and (3) makes knowledge explicit and interpretable for human inspection and intervention. Experiments show that AlphaOPT steadily improves with more data (65% to 72% from 100 to 300 training items) and surpasses the strongest baseline by 7.7% on the out-of-distribution OptiBench dataset when trained only on answers. Code and data are available at: https://github.com/Minw913/AlphaOPT.

AlphaOPT: 自己改善型LLM経験ライブラリを用いた最適化プログラムの定式化

AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

要旨

Support