一种用于计算机控制的零样本语言代理，具有结构化反思

摘要

大型语言模型（LLMs）已经展示出在实时计算机环境（例如MiniWoB++）中规划和执行高级目标的能力不断增强。为了执行任务，最近的研究通常要求模型通过监督学习或少/多次提示从任务的跟踪示例中学习。在没有这些跟踪示例的情况下，一个挑战是如何让代理能够自主学习并改善其对计算机的控制，这限制了代理执行新任务的能力。我们通过零-shot代理来解决这个问题，该代理不需要给定的专家跟踪。我们的代理计划在部分观察到的环境中执行动作，并通过自我反思和结构化思维管理来识别和学习错误，逐步推进任务。在MiniWoB++的简单任务中，我们展示了我们的零-shot代理通常优于最近的最先进技术，推理效率更高。对于更复杂的任务，我们的反思代理表现与先前最佳模型持平，尽管以前的研究具有访问专家跟踪或额外屏幕信息的优势。

English

Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment (e.g. MiniWoB++). To perform a task, recent works often require a model to learn from trace examples of the task via either supervised learning or few/many-shot prompting. Without these trace examples, it remains a challenge how an agent can autonomously learn and improve its control on a computer, which limits the ability of an agent to perform a new task. We approach this problem with a zero-shot agent that requires no given expert traces. Our agent plans for executable actions on a partially observed environment, and iteratively progresses a task by identifying and learning from its mistakes via self-reflection and structured thought management. On the easy tasks of MiniWoB++, we show that our zero-shot agent often outperforms recent SoTAs, with more efficient reasoning. For tasks with more complexity, our reflective agent performs on par with prior best models, even though previous works had the advantages of accessing expert traces or additional screen information.

一种用于计算机控制的零样本语言代理，具有结构化反思

A Zero-Shot Language Agent for Computer Control with Structured Reflection

摘要

Support