WALL-E:通過規則學習改善基於世界模型的LLM代理
WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents
October 9, 2024
作者: Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang
cs.AI
摘要
大型語言模型(LLMs)可以直接作為基於模型的代理人強大的世界模型嗎?儘管存在LLMs的先前知識與指定環境動態之間的差距,但我們的研究顯示這些差距可以通過將LLM與其部署的環境對齊來彌合,而這種“世界對齊”可以通過在LLMs上進行規則學習來有效實現。鑒於LLMs豐富的先前知識,僅需一些額外的規則就足以將LLM的預測與指定環境動態對齊。為此,我們提出了一種神經符號方法,通過LLMs無梯度地學習這些規則,通過對代理人探索的軌跡和世界模型預測進行比較來誘導、更新和修剪規則。所得的世界模型由LLM和學習到的規則組成。我們的具體化LLM代理人“WALL-E”建立在模型預測控制(MPC)之上。通過基於精確世界模型優化前瞻行動,MPC顯著提高了探索和學習效率。與現有的LLM代理人相比,WALL-E的推理只需要少數主要規則,而不需要在LLM輸入中包含冗長的緩衝軌跡。在Minecraft和ALFWorld的開放世界挑戰中,WALL-E的成功率高於現有方法,重新規劃時間和用於推理的令牌數量成本更低。在Minecraft中,WALL-E的成功率超過基準線15-30%,同時重新規劃輪數減少8-20,僅使用60-80%的令牌。在ALFWorld中,僅經過6次迭代後,其成功率飆升至95%,創下新的歷史記錄。
English
Can large language models (LLMs) directly serve as powerful world models for
model-based agents? While the gaps between the prior knowledge of LLMs and the
specified environment's dynamics do exist, our study reveals that the gaps can
be bridged by aligning an LLM with its deployed environment and such "world
alignment" can be efficiently achieved by rule learning on LLMs. Given the rich
prior knowledge of LLMs, only a few additional rules suffice to align LLM
predictions with the specified environment dynamics. To this end, we propose a
neurosymbolic approach to learn these rules gradient-free through LLMs, by
inducing, updating, and pruning rules based on comparisons of agent-explored
trajectories and world model predictions. The resulting world model is composed
of the LLM and the learned rules. Our embodied LLM agent "WALL-E" is built upon
model-predictive control (MPC). By optimizing look-ahead actions based on the
precise world model, MPC significantly improves exploration and learning
efficiency. Compared to existing LLM agents, WALL-E's reasoning only requires a
few principal rules rather than verbose buffered trajectories being included in
the LLM input. On open-world challenges in Minecraft and ALFWorld, WALL-E
achieves higher success rates than existing methods, with lower costs on
replanning time and the number of tokens used for reasoning. In Minecraft,
WALL-E exceeds baselines by 15-30% in success rate while costing 8-20 fewer
replanning rounds and only 60-80% of tokens. In ALFWorld, its success rate
surges to a new record high of 95% only after 6 iterations.Summary
AI-Generated Summary