WALL-E: ルール学習によるワールドアライメントがワールドモデルベースのLLMエージェントを向上させる

要旨

大規模言語モデル（LLM）は、モデルベースのエージェントにとって強力なワールドモデルとして直接利用できるでしょうか？LLMの事前知識と指定された環境のダイナミクスとの間にはギャップが存在しますが、当研究では、LLMを展開された環境と整合させることでこれらのギャップを埋めることができ、そのような「ワールドアライメント」はLLM上でのルール学習によって効率的に達成できることを明らかにしました。LLMの豊富な事前知識を考慮すると、指定された環境のダイナミクスとLLMの予測を整合させるのにわずかな追加ルールが十分です。このため、我々は、LLM上でルールを勾配なしに学習するためのニューロシンボリックアプローチを提案します。エージェントが探索した軌跡とワールドモデルの予測との比較に基づいてルールを誘導し、更新し、剪定します。得られるワールドモデルは、LLMと学習されたルールから構成されます。当社の具現化されたLLMエージェント「WALL-E」は、モデル予測制御（MPC）に基づいて構築されています。正確なワールドモデルに基づいて先読みアクションを最適化することで、MPCは探索と学習の効率を大幅に向上させます。既存のLLMエージェントと比較して、WALL-Eの推論には、LLM入力に含まれる冗長なバッファード軌跡ではなく、わずかな主要ルールだけが必要です。MinecraftとALFWorldのオープンワールドの課題では、WALL-Eは既存の手法よりも高い成功率を達成し、再計画時間と推論に使用されるトークンの数を削減します。Minecraftでは、WALL-Eは成功率でベースラインを15-30%上回り、再計画ラウンドが8-20回少なく、トークンの数が60-80%しか必要ありません。ALFWorldでは、その成功率はたった6回の反復の後に新記録の95%に急上昇します。

English

Can large language models (LLMs) directly serve as powerful world models for model-based agents? While the gaps between the prior knowledge of LLMs and the specified environment's dynamics do exist, our study reveals that the gaps can be bridged by aligning an LLM with its deployed environment and such "world alignment" can be efficiently achieved by rule learning on LLMs. Given the rich prior knowledge of LLMs, only a few additional rules suffice to align LLM predictions with the specified environment dynamics. To this end, we propose a neurosymbolic approach to learn these rules gradient-free through LLMs, by inducing, updating, and pruning rules based on comparisons of agent-explored trajectories and world model predictions. The resulting world model is composed of the LLM and the learned rules. Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC). By optimizing look-ahead actions based on the precise world model, MPC significantly improves exploration and learning efficiency. Compared to existing LLM agents, WALL-E's reasoning only requires a few principal rules rather than verbose buffered trajectories being included in the LLM input. On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods, with lower costs on replanning time and the number of tokens used for reasoning. In Minecraft, WALL-E exceeds baselines by 15-30% in success rate while costing 8-20 fewer replanning rounds and only 60-80% of tokens. In ALFWorld, its success rate surges to a new record high of 95% only after 6 iterations.

WALL-E: ルール学習によるワールドアライメントがワールドモデルベースのLLMエージェントを向上させる

WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents

要旨

Support