自生成上下文示例提升LLM代理在序列决策任务中的表现

摘要

提升大型语言模型（LLM）代理在序列决策任务中表现的诸多方法，往往依赖于特定任务的知识工程——例如提示调优、精心挑选的上下文示例，或是定制的观察与动作空间。采用这些方法时，代理的性能随着知识工程投入的质量或数量而提升。然而，我们探索的是LLM代理如何通过从自身在相似任务上的成功经验中进行上下文学习，自动提升其表现。我们摒弃了对特定任务知识工程的依赖，转而专注于构建并优化一个自生成示例的数据库。研究表明，即便是在训练任务中简单累积成功轨迹，也能在三个基准测试上显著提升测试性能：ALFWorld（从73%提升至89%）、Wordcraft（从55%提升至64%）以及InterCode-SQL（从75%提升至79%）——这一表现与初始代理在每项任务允许尝试两到三次时达到的水平相当。随后，我们引入了两项扩展：（1）通过基于群体的训练进行数据库级别的筛选，以识别出高效示例集合；（2）示例级别的筛选，依据其作为上下文示例的实际效用保留个别轨迹。这些扩展进一步提升了性能，在ALFWorld上达到了91%的准确率——与那些采用特定任务组件和提示的更复杂方法相媲美。我们的研究成果表明，自动构建轨迹数据库为替代劳动密集型知识工程提供了一条极具吸引力的路径。

English

Many methods for improving Large Language Model (LLM) agents for sequential decision-making tasks depend on task-specific knowledge engineering--such as prompt tuning, curated in-context examples, or customized observation and action spaces. Using these approaches, agent performance improves with the quality or amount of knowledge engineering invested. Instead, we investigate how LLM agents can automatically improve their performance by learning in-context from their own successful experiences on similar tasks. Rather than relying on task-specific knowledge engineering, we focus on constructing and refining a database of self-generated examples. We demonstrate that even a naive accumulation of successful trajectories across training tasks boosts test performance on three benchmarks: ALFWorld (73% to 89%), Wordcraft (55% to 64%), and InterCode-SQL (75% to 79%)--matching the performance the initial agent achieves if allowed two to three attempts per task. We then introduce two extensions: (1) database-level selection through population-based training to identify high-performing example collections, and (2) exemplar-level selection that retains individual trajectories based on their empirical utility as in-context examples. These extensions further enhance performance, achieving 91% on ALFWorld--matching more complex approaches that employ task-specific components and prompts. Our results demonstrate that automatic trajectory database construction offers a compelling alternative to labor-intensive knowledge engineering.

自生成上下文示例提升LLM代理在序列决策任务中的表现

Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks

摘要

Support