자체 생성된 인컨텍스트 예제가 순차적 의사결정 작업을 위한 LLM 에이전트를 개선한다

초록

순차적 의사결정 작업을 위한 대형 언어 모델(LLM) 에이전트의 성능을 개선하기 위한 많은 방법들은 작업별 지식 공학에 의존합니다. 이는 프롬프트 튜닝, 선별된 문맥 내 예시, 또는 맞춤형 관찰 및 행동 공간 등을 포함합니다. 이러한 접근법을 사용할 때, 에이전트의 성능은 투입된 지식 공학의 질이나 양에 따라 향상됩니다. 대신, 우리는 LLM 에이전트가 유사한 작업에서 자신의 성공적인 경험을 문맥 내에서 학습함으로써 성능을 자동으로 개선할 수 있는 방법을 연구합니다. 작업별 지식 공학에 의존하는 대신, 우리는 자체 생성된 예시 데이터베이스를 구축하고 개선하는 데 초점을 맞춥니다. 우리는 훈련 작업에서 성공적인 궤적을 단순히 누적하는 것만으로도 ALFWorld(73%에서 89%로), Wordcraft(55%에서 64%로), InterCode-SQL(75%에서 79%로) 세 가지 벤치마크에서 테스트 성능이 향상됨을 보여줍니다. 이는 초기 에이전트가 작업당 두세 번의 시도를 허용했을 때 달성하는 성능과 일치합니다. 그런 다음 우리는 두 가지 확장을 소개합니다: (1) 고성능 예시 컬렉션을 식별하기 위한 인구 기반 훈련을 통한 데이터베이스 수준 선택, 그리고 (2) 문맥 내 예시로서의 경험적 유용성을 기반으로 개별 궤적을 보유하는 예시 수준 선택. 이러한 확장은 성능을 더욱 향상시켜 ALFWorld에서 91%를 달성하며, 작업별 구성 요소와 프롬프트를 사용하는 더 복잡한 접근법과 동등한 성능을 보여줍니다. 우리의 결과는 자동 궤적 데이터베이스 구축이 노동 집약적인 지식 공학에 대한 강력한 대안을 제공함을 입증합니다.

English

Many methods for improving Large Language Model (LLM) agents for sequential decision-making tasks depend on task-specific knowledge engineering--such as prompt tuning, curated in-context examples, or customized observation and action spaces. Using these approaches, agent performance improves with the quality or amount of knowledge engineering invested. Instead, we investigate how LLM agents can automatically improve their performance by learning in-context from their own successful experiences on similar tasks. Rather than relying on task-specific knowledge engineering, we focus on constructing and refining a database of self-generated examples. We demonstrate that even a naive accumulation of successful trajectories across training tasks boosts test performance on three benchmarks: ALFWorld (73% to 89%), Wordcraft (55% to 64%), and InterCode-SQL (75% to 79%)--matching the performance the initial agent achieves if allowed two to three attempts per task. We then introduce two extensions: (1) database-level selection through population-based training to identify high-performing example collections, and (2) exemplar-level selection that retains individual trajectories based on their empirical utility as in-context examples. These extensions further enhance performance, achieving 91% on ALFWorld--matching more complex approaches that employ task-specific components and prompts. Our results demonstrate that automatic trajectory database construction offers a compelling alternative to labor-intensive knowledge engineering.

자체 생성된 인컨텍스트 예제가 순차적 의사결정 작업을 위한 LLM 에이전트를 개선한다

Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks

초록

Support