自我生成的上下文範例提升LLM代理在序列決策任務中的表現

摘要

許多改進大型語言模型（LLM）代理在序列決策任務中表現的方法，依賴於特定任務的知識工程——例如提示調優、精心挑選的上下文示例，或定制的觀察和動作空間。使用這些方法，代理的表現會隨著投入的知識工程質量或數量而提升。相反，我們探討了LLM代理如何通過在類似任務中從自身成功的經驗中進行上下文學習，自動提升其表現。我們不依賴於特定任務的知識工程，而是專注於構建和精煉一個自生成示例的數據庫。我們證明，即使在訓練任務中簡單地累積成功軌跡，也能在三個基準測試中提升測試表現：ALFWorld（從73%提升至89%）、Wordcraft（從55%提升至64%）和InterCode-SQL（從75%提升至79%）——這與初始代理在每個任務允許兩到三次嘗試時所達到的表現相當。接著，我們引入了兩個擴展：(1) 通過基於種群的訓練進行數據庫級別選擇，以識別高表現的示例集合；(2) 示例級別選擇，根據其作為上下文示例的實用性保留個別軌跡。這些擴展進一步提升了表現，在ALFWorld上達到了91%——與使用特定任務組件和提示的更複雜方法相當。我們的結果表明，自動軌跡數據庫構建提供了一個引人注目的替代方案，避免了耗時的知識工程。

English

Many methods for improving Large Language Model (LLM) agents for sequential decision-making tasks depend on task-specific knowledge engineering--such as prompt tuning, curated in-context examples, or customized observation and action spaces. Using these approaches, agent performance improves with the quality or amount of knowledge engineering invested. Instead, we investigate how LLM agents can automatically improve their performance by learning in-context from their own successful experiences on similar tasks. Rather than relying on task-specific knowledge engineering, we focus on constructing and refining a database of self-generated examples. We demonstrate that even a naive accumulation of successful trajectories across training tasks boosts test performance on three benchmarks: ALFWorld (73% to 89%), Wordcraft (55% to 64%), and InterCode-SQL (75% to 79%)--matching the performance the initial agent achieves if allowed two to three attempts per task. We then introduce two extensions: (1) database-level selection through population-based training to identify high-performing example collections, and (2) exemplar-level selection that retains individual trajectories based on their empirical utility as in-context examples. These extensions further enhance performance, achieving 91% on ALFWorld--matching more complex approaches that employ task-specific components and prompts. Our results demonstrate that automatic trajectory database construction offers a compelling alternative to labor-intensive knowledge engineering.

自我生成的上下文範例提升LLM代理在序列決策任務中的表現

Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks

摘要

Support