通過對大型語言模型進行測試時間縮放生成符號世界模型。

摘要

解決複雜的規劃問題需要大型語言模型（LLMs）明確地建模狀態轉換，以避免違反規則、符合約束條件並確保最佳性，這是一項受自然語言固有歧義性影響的任務。為了克服這種歧義性，規劃領域定義語言（PDDL）被利用作為一種規劃抽象，使得能夠進行精確和正式的狀態描述。利用PDDL，我們可以生成一個符號世界模型，其中可以無縫應用經典搜索算法，如A*，以找到最優計劃。然而，由於缺乏PDDL訓練數據，直接使用當前LLMs生成PDDL領域仍然是一個未解之謎。為了應對這一挑戰，我們提出擴大LLMs的測試時間計算以增強其PDDL推理能力，從而實現高質量PDDL領域的生成。具體來說，我們引入了一種簡單而有效的算法，首先採用最佳N抽樣方法來改善初始解的質量，然後通過口語化機器學習以精細的方式完善解決方案。我們的方法在PDDL領域生成方面遠遠優於o1-mini，在兩項任務（即從自然語言描述或PDDL問題生成PDDL領域）中實現了超過50%的成功率，而無需額外的訓練。通過利用PDDL作為狀態抽象，我們的方法能夠在幾乎所有競賽級規劃任務上勝過當前最先進的方法。

English

Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality-a task hindered by the inherent ambiguity of natural language. To overcome such ambiguity, Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions. With PDDL, we can generate a symbolic world model where classic searching algorithms, such as A*, can be seamlessly applied to find optimal plans. However, directly generating PDDL domains with current LLMs remains an open challenge due to the lack of PDDL training data. To address this challenge, we propose to scale up the test-time computation of LLMs to enhance their PDDL reasoning capabilities, thereby enabling the generation of high-quality PDDL domains. Specifically, we introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution and then refines the solution in a fine-grained manner with verbalized machine learning. Our method outperforms o1-mini by a considerable margin in the generation of PDDL domain, achieving over 50% success rate on two tasks (i.e., generating PDDL domains from natural language description or PDDL problems). This is done without requiring additional training. By taking advantage of PDDL as state abstraction, our method is able to outperform current state-of-the-art methods on almost all competition-level planning tasks.

通過對大型語言模型進行測試時間縮放生成符號世界模型。

Generating Symbolic World Models via Test-time Scaling of Large Language Models

摘要

Support