optimize_anything: 모든 텍스트 매개변수를 최적화하는 범용 API

초록

단일 LLM 기반 최적화 시스템이 근본적으로 다른 도메인에서 특화된 도구와 경쟁할 수 있을까? 본 연구에서는 최적화 문제를 점수 함수로 평가되는 텍스트 아티팩트 개선 형태로 정식화할 때, 단일 AI 기반 최적화 시스템(단일 작업 탐색, 교차 문제 전이를 통한 다중 작업 탐색, 미관측 입력에 대한 일반화 지원)이 여섯 가지 다양한 작업에서 최첨단 결과를 달성함을 보인다. 우리 시스템은 ARC-AGI 정확도를 Gemini Flash 대비 거의 3배 향상시키는 에이전트 아키텍처(32.5% → 89.5%)를 발견하고, 클라우드 비용을 40% 절감하는 스케줄링 알고리즘을 찾으며, 87%가 PyTorch와 동등하거나 더 나은 성능을 보이는 CUDA 커널을 생성하고, AlphaEvolve의 보고된 원 채우기 솔루션(n=26)을 능가한다. 세 가지 도메인에 걸친 절제 연구를 통해, 점수만 제공되는 피드백보다 실행 가능한 부가 정보가 더 빠른 수렴과 실질적으로 더 높은 최종 점수를 제공하며, 다중 작업 탐색이 교차 작업 전이를 통해 동등한 작업당 예산 하에서 독립 최적화보다 우수한 성능을 보이고, 관련 작업 수가 증가할수록 그 이점이 확대됨을 밝혀낸다. 이를 통해 본 연구는 LLM 기반 탐색을 통한 텍스트 최적화가 범용 문제 해결 패러다임임을 최초로 입증하며, 전통적으로 도메인별 알고리즘이 필요했던 작업들을 단일 프레임워크로 통합한다. 우리는 GEPA 프로젝트의 일환으로 여러 백엔드를 지원하는 optimize_anything을 https://github.com/gepa-ai/gepa 에서 오픈소스로 공개한다.

English

Can a single LLM-based optimization system match specialized tools across fundamentally different domains? We show that when optimization problems are formulated as improving a text artifact evaluated by a scoring function, a single AI-based optimization system-supporting single-task search, multi-task search with cross-problem transfer, and generalization to unseen inputs-achieves state-of-the-art results across six diverse tasks. Our system discovers agent architectures that nearly triple Gemini Flash's ARC-AGI accuracy (32.5% to 89.5%), finds scheduling algorithms that cut cloud costs by 40%, generates CUDA kernels where 87% match or beat PyTorch, and outperforms AlphaEvolve's reported circle packing solution (n=26). Ablations across three domains reveal that actionable side information yields faster convergence and substantially higher final scores than score-only feedback, and that multi-task search outperforms independent optimization given equivalent per-problem budget through cross-task transfer, with benefits scaling with the number of related tasks. Together, we show for the first time that text optimization with LLM-based search is a general-purpose problem-solving paradigm, unifying tasks traditionally requiring domain-specific algorithms under a single framework. We open-source optimize\_anything with support for multiple backends as part of the GEPA project at https://github.com/gepa-ai/gepa .