optimize_anything: 优化任意文本参数的通用API

摘要

单一LLM优化系统能否在本质上不同的领域中与专用工具匹敌？我们证明，当优化问题被表述为通过评分函数改进文本制品时，一个支持单任务搜索、跨问题迁移的多任务搜索以及对未见输入泛化的单一AI优化系统，可在六个不同任务上达到最先进水平。我们的系统发现的智能体架构使Gemini Flash在ARC-AGI准确率上近乎提升三倍（从32.5%提升至89.5%）；发现的调度算法将云成本削减40%；生成的CUDA内核中87%与PyTorch性能相当或更优；并且超越了AlphaEvolve在圆堆积问题（n=26）上的报告解。三个领域的消融实验表明，相比仅含评分的反馈，可操作侧信息能带来更快的收敛速度和显著更高的最终分数；并且在同等单问题预算下，多任务搜索通过跨任务迁移优于独立优化，其收益随相关任务数量增加而扩大。综上，我们首次证明，基于LLM搜索的文本优化是一种通用问题求解范式，将传统上需要领域特定算法的任务统一于单一框架之下。我们开源了optimize_anything，支持多种后端，作为GEPA项目的一部分，地址为https://github.com/gepa-ai/gepa。

English

Can a single LLM-based optimization system match specialized tools across fundamentally different domains? We show that when optimization problems are formulated as improving a text artifact evaluated by a scoring function, a single AI-based optimization system-supporting single-task search, multi-task search with cross-problem transfer, and generalization to unseen inputs-achieves state-of-the-art results across six diverse tasks. Our system discovers agent architectures that nearly triple Gemini Flash's ARC-AGI accuracy (32.5% to 89.5%), finds scheduling algorithms that cut cloud costs by 40%, generates CUDA kernels where 87% match or beat PyTorch, and outperforms AlphaEvolve's reported circle packing solution (n=26). Ablations across three domains reveal that actionable side information yields faster convergence and substantially higher final scores than score-only feedback, and that multi-task search outperforms independent optimization given equivalent per-problem budget through cross-task transfer, with benefits scaling with the number of related tasks. Together, we show for the first time that text optimization with LLM-based search is a general-purpose problem-solving paradigm, unifying tasks traditionally requiring domain-specific algorithms under a single framework. We open-source optimize\_anything with support for multiple backends as part of the GEPA project at https://github.com/gepa-ai/gepa .