optimize_anything：一個用於優化任何文本參數的通用API

摘要

一個基於LLM的單一優化系統能否在根本不同的領域中比肩專業工具？我們證明了，當優化問題被表述為改進由評分函數評估的文本產物時，一個支援單任務搜索、跨問題遷移的多任務搜索，以及對未見過輸入進行泛化的單一AI優化系統，可在六個不同任務中達到最先進的結果。我們的系統發現的智能體架構使Gemini Flash的ARC-AGI準確率從32.5%提升至89.5%（幾乎三倍），找到的排程演算法將雲端成本降低40%，生成的CUDA核心中有87%與PyTorch匹配或超越，並在圓堆疊問題（n=26）上超越AlphaEvolve的報告結果。跨三個領域的消融實驗顯示，可操作的輔助資訊比僅提供分數的回饋帶來更快的收斂速度和顯著更高的最終分數；在等量問題預算下，多任務搜索透過跨任務遷移優於獨立優化，且效益隨相關任務數量增加而擴大。我們首次證明，基於LLM搜索的文字優化是一種通用問題求解範式，將傳統上需要特定領域演算法的任務統一在單一架構下。我們在GEPA專案中開源了optimize_anything，支援多種後端，網址為https://github.com/gepa-ai/gepa。

English

Can a single LLM-based optimization system match specialized tools across fundamentally different domains? We show that when optimization problems are formulated as improving a text artifact evaluated by a scoring function, a single AI-based optimization system-supporting single-task search, multi-task search with cross-problem transfer, and generalization to unseen inputs-achieves state-of-the-art results across six diverse tasks. Our system discovers agent architectures that nearly triple Gemini Flash's ARC-AGI accuracy (32.5% to 89.5%), finds scheduling algorithms that cut cloud costs by 40%, generates CUDA kernels where 87% match or beat PyTorch, and outperforms AlphaEvolve's reported circle packing solution (n=26). Ablations across three domains reveal that actionable side information yields faster convergence and substantially higher final scores than score-only feedback, and that multi-task search outperforms independent optimization given equivalent per-problem budget through cross-task transfer, with benefits scaling with the number of related tasks. Together, we show for the first time that text optimization with LLM-based search is a general-purpose problem-solving paradigm, unifying tasks traditionally requiring domain-specific algorithms under a single framework. We open-source optimize\_anything with support for multiple backends as part of the GEPA project at https://github.com/gepa-ai/gepa .