SePO: 시스템 프롬프트 최적화를 위한 자기 진화 프롬프트 에이전트

초록

시스템 프롬프트 최적화는 기반 모델을 수정하지 않고도 에이전트의 행동을 개선하며, 사람이 읽을 수 있고 모델에 구애받지 않는 명령어를 생성한다. 기존 방법은 작업 에이전트의 시스템 프롬프트를 개선하는 프롬프트 에이전트를 구축하지만, 프롬프트 에이전트 자체의 시스템 프롬프트는 수작업으로 설계되어 고정된 상태로 남겨둔다. 우리는 자기진화 프롬프트 최적화(SePO)를 제안하며, 이는 프롬프트 에이전트 자체의 시스템 프롬프트를 작업 에이전트의 시스템 프롬프트와 함께 최적화 대상으로 취급한다. SePO는 자기참조적 설계를 채택한다. 단일 프롬프트 에이전트가 작업 에이전트의 시스템 프롬프트와 자신의 시스템 프롬프트를 모두 개선하며, 이는 개방형 진화 탐색을 통해 수행되며, 후보 프롬프트의 아카이브를 발판으로 유지한다. 학습은 두 단계로 진행된다: 사전 학습에서는 다중 작업 풀에서 프롬프트 에이전트를 진화시키고, 이후 미세 조정에서 이를 대상 작업에 적용한다. 수학(AIME'25), 추상적 추론(ARC-AGI-1), 대학원 수준 과학(GPQA), 코드 생성(MBPP), 논리 퍼즐(스도쿠)을 포함한 다섯 가지 벤치마크에서 SePO는 Manual-CoT, TextGrad, MetaSPO를 일관되게 능가하며, Manual-CoT 대비 평균 정확도를 4.49포인트 향상시킨다. 사전 학습에서 얻은 프롬프트 최적화 능력은 작업별 프롬프트를 암기하는 대신 사전 학습 혼합 범위를 넘어서는 작업에도 일반화된다.

English

System prompt optimization improves agent behavior without modifying the underlying model, yielding human-readable, model-agnostic instructions. Existing methods build a prompt agent that refines task agents' system prompts, yet leave the prompt agent's own system prompt hand-engineered and fixed. We propose Self-Evolving Prompt Optimization (SePO), which treats the prompt agent's own system prompt as an optimization target alongside task agents' system prompts. SePO adopts a self-referential design. A single prompt agent improves both task agents' system prompts and its own under an open-ended evolutionary search that maintains an archive of candidate prompts as stepping stones. Training proceeds in two stages: pre-training evolves the prompt agent on a multi-task pool, and fine-tuning then applies it to a target task. Across five benchmarks spanning math (AIME'25), abstract reasoning (ARC-AGI-1), graduate-level science (GPQA), code generation (MBPP), and logic puzzles (Sudoku), SePO consistently outperforms Manual-CoT, TextGrad, and MetaSPO, improving the average accuracy by 4.49 points compared to Manual-CoT. The prompt optimization skill from pre-training also generalizes to tasks beyond the pre-training mixture, rather than memorizing per-task prompts.