一個用於計算機控制的零樣本語言代理人，具有結構化反思

摘要

大型語言模型（LLMs）已展示出在實時電腦環境（例如MiniWoB++）中規劃和執行高級目標的能力不斷增強。為了執行任務，最近的研究通常要求模型通過監督學習或少/多次提示從任務的跟踪示例中學習。如果沒有這些跟踪示例，一個代理如何能夠自主學習並改進其在電腦上的控制仍然是一個挑戰，這限制了代理執行新任務的能力。我們通過一個零-shot代理方法來解決這個問題，該代理不需要給定的專家跟踪。我們的代理計劃在部分觀察到的環境中執行動作，並通過自我反思和結構化思維管理來識別並從錯誤中學習，逐步推進任務。在MiniWoB++的簡單任務中，我們展示了我們的零-shot代理通常優於最近的SoTAs，並具有更高效的推理能力。對於更複雜的任務，我們的反思代理表現與先前最佳模型相當，即使以前的作品具有訪問專家跟踪或額外屏幕信息的優勢。

English

Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment (e.g. MiniWoB++). To perform a task, recent works often require a model to learn from trace examples of the task via either supervised learning or few/many-shot prompting. Without these trace examples, it remains a challenge how an agent can autonomously learn and improve its control on a computer, which limits the ability of an agent to perform a new task. We approach this problem with a zero-shot agent that requires no given expert traces. Our agent plans for executable actions on a partially observed environment, and iteratively progresses a task by identifying and learning from its mistakes via self-reflection and structured thought management. On the easy tasks of MiniWoB++, we show that our zero-shot agent often outperforms recent SoTAs, with more efficient reasoning. For tasks with more complexity, our reflective agent performs on par with prior best models, even though previous works had the advantages of accessing expert traces or additional screen information.

一個用於計算機控制的零樣本語言代理人，具有結構化反思

A Zero-Shot Language Agent for Computer Control with Structured Reflection

摘要

Support