超越A*：透過搜索動態以Transformer進行更佳規劃引導

摘要

儘管Transformer在各種應用場景中取得了巨大進展，但這類結構在解決複雜決策任務方面仍遠遠落後於傳統符號式規劃器。在這項研究中，我們展示了如何訓練Transformer來解決複雜的規劃任務，並提出了Searchformer，一個Transformer模型，可以在93.7%的時間內最佳地解決以前未見過的Sokoban益智遊戲，同時比標準A*搜索使用少達26.8%的搜索步驟。Searchformer是一個編碼器-解碼器Transformer模型，經道訓練以預測A*搜索的搜索動態。然後通過專家迭代進行微調，以執行比A*搜索更少的搜索步驟，同時生成最佳計劃。在我們的訓練方法中，A*搜索的搜索動態被表達為一個標記序列，概述了在符號式規劃期間何時將任務狀態添加和移除到搜索樹中。在我們對迷宮導航的消融研究中，我們發現Searchformer明顯優於直接預測最佳計劃的基線，並且模型大小小5-10倍，訓練數據集小10倍。我們還展示了Searchformer如何擴展到更大更複雜的決策任務，如Sokoban，提高了解決任務的百分比並縮短了搜索動態。

English

While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard A^* search. Searchformer is an encoder-decoder Transformer model trained to predict the search dynamics of A^*. This model is then fine-tuned via expert iterations to perform fewer search steps than A^* search while still generating an optimal plan. In our training method, A^*'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. In our ablation studies on maze navigation, we find that Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10times smaller model size and a 10times smaller training dataset. We also demonstrate how Searchformer scales to larger and more complex decision making tasks like Sokoban with improved percentage of solved tasks and shortened search dynamics.

超越A*：透過搜索動態以Transformer進行更佳規劃引導

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

摘要

Support