경험에 기반한 추론 시점 전략 적응

초록

에이전트형 AI 시스템이 훈련 후 상호작용을 기반으로 문제 해결 접근법을 적응적으로 조정할 수 있도록 하는 것은 여전히 근본적인 과제로 남아 있습니다. 추론 시점에 메모리를 갱신하고 유지하는 시스템이 제안되었으나, 기존 설계는 언어 모델이나 에이전트에 대한 텍스트 입력을 수정하는 방식으로만 시스템을 조종하므로 샘플링 매개변수 변경, 도구 제거, 시스템 프롬프트 수정, 에이전트형 및 워크플로우 패러다임 간 전환이 불가능합니다. 반면, 더 유연하게 적응하는 시스템은 오프라인 최적화가 필요하며 일단 배포되면 정적으로 유지됩니다. 본 논문에서는 축적된 경험을 바탕으로 추론 시점에 동적으로 맞춤형 전략(LLM 호출, 도구, 샘플링 매개변수, 제어 논리를 포함한 완전한 계산 절차)을 생성하는 Experience-Guided Reasoner(EGuR)를 제안합니다. 이는 전략 구성 요소(프롬프트, 샘플링 매개변수, 도구 구성, 제어 논리) 전체의 적응을 가능하게 하는 LLM 기반 메타 전략(전략을 출력하는 전략)을 통해 구현됩니다. EGuR은 두 가지 구성 요소로 운영됩니다: Guide는 현재 문제와 구조화된 과거 경험 메모리를 조건으로 여러 후보 전략을 생성하고, Consolidator는 실행 피드백을 통합하여 향후 전략 생성을 개선합니다. 이를 통해 각 문제에 최적화된 완전한 실행 준비된 전략이 생성되며, 이는 필요에 따라 캐시, 검색, 실행되어 자원 낭비 없이 활용될 수 있습니다. 5개의 도전적인 벤치마크(AIME 2025, 3-SAT, 세 가지 Big Bench Extra Hard 작업)에서 EGuR은 가장 강력한 기준 시스템 대비 최대 14%의 정확도 향상을 달성하면서 계산 비용을 최대 111배 절감하였으며, 두 지표 모두 시스템이 경험을 축적함에 따라 지속적으로 개선되었습니다.

English

Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at inference time have been proposed, existing designs only steer the system by modifying textual input to a language model or agent, which means that they cannot change sampling parameters, remove tools, modify system prompts, or switch between agentic and workflow paradigms. On the other hand, systems that adapt more flexibly require offline optimization and remain static once deployed. We present Experience-Guided Reasoner (EGuR), which generates tailored strategies -- complete computational procedures involving LLM calls, tools, sampling parameters, and control logic -- dynamically at inference time based on accumulated experience. We achieve this using an LLM-based meta-strategy -- a strategy that outputs strategies -- enabling adaptation of all strategy components (prompts, sampling parameters, tool configurations, and control logic). EGuR operates through two components: a Guide generates multiple candidate strategies conditioned on the current problem and structured memory of past experiences, while a Consolidator integrates execution feedback to improve future strategy generation. This produces complete, ready-to-run strategies optimized for each problem, which can be cached, retrieved, and executed as needed without wasting resources. Across five challenging benchmarks (AIME 2025, 3-SAT, and three Big Bench Extra Hard tasks), EGuR achieves up to 14% accuracy improvements over the strongest baselines while reducing computational costs by up to 111x, with both metrics improving as the system gains experience.