超越A*：通过搜索动态使用Transformer实现更好的规划引导

摘要

尽管Transformer在各种应用场景中取得了巨大进展，但这类架构在解决复杂决策任务方面仍然落后于传统的符号规划器。在这项工作中，我们演示了如何训练Transformer来解决复杂的规划任务，并提出了Searchformer，这是一个Transformer模型，可以在93.7%的情况下最优地解决以前未见的Sokoban难题，同时比标准A*搜索少使用高达26.8%的搜索步骤。Searchformer是一个编码器-解码器Transformer模型，经过训练可以预测A*搜索的搜索动态。然后通过专家迭代对该模型进行微调，以执行比A*搜索更少的搜索步骤，同时生成最佳计划。在我们的训练方法中，A*搜索的搜索动态被表达为一个标记序列，概述了符号规划过程中任务状态何时被添加和移除到搜索树中。在我们对迷宫导航的消融研究中，我们发现Searchformer明显优于直接使用5-10倍较小模型大小和10倍较小训练数据集的基准模型来预测最佳计划。我们还演示了Searchformer如何扩展到更大更复杂的决策任务，如Sokoban，提高了解决任务的百分比并缩短了搜索动态。

English

While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard A^* search. Searchformer is an encoder-decoder Transformer model trained to predict the search dynamics of A^*. This model is then fine-tuned via expert iterations to perform fewer search steps than A^* search while still generating an optimal plan. In our training method, A^*'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. In our ablation studies on maze navigation, we find that Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10times smaller model size and a 10times smaller training dataset. We also demonstrate how Searchformer scales to larger and more complex decision making tasks like Sokoban with improved percentage of solved tasks and shortened search dynamics.

超越A*：通过搜索动态使用Transformer实现更好的规划引导

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

摘要

Support