大規模言語モデルのテスト時スケーリングを用いたシンボリック世界モデルの生成

要旨

複雑な計画問題を解決するには、大規模言語モデル（LLM）が状態遷移を明示的にモデル化して規則違反を回避し、制約を遵守し、最適性を確保する必要があります。これは、自然言語の固有の曖昧さによって妨げられるタスクです。このような曖昧さを克服するために、計画ドメイン定義言語（PDDL）が計画の抽象化として活用され、正確で形式的な状態記述を可能にします。PDDLを使用することで、記号的な世界モデルを生成し、A*などのクラシックな探索アルゴリズムを適用して最適な計画を見つけることができます。ただし、現在のLLMによるPDDLドメインの直接生成は、PDDLトレーニングデータの不足により未解決の課題です。この課題に対処するために、私たちはLLMのテスト時計算を拡大してPDDL推論能力を向上させ、高品質なPDDLドメインの生成を可能にすることを提案します。具体的には、初期解の品質を向上させるためにBest-of-Nサンプリングアプローチを最初に使用し、その後、口頭での機械学習によって解を精緻に改良するシンプルかつ効果的なアルゴリズムを導入します。私たちの手法は、PDDLドメインの生成においてo1-miniを大幅に上回り、2つのタスク（つまり、自然言語の記述またはPDDL問題からPDDLドメインを生成する）で50％以上の成功率を達成します。これは追加のトレーニングを必要とせずに行われます。PDDLを状態の抽象化として活用することで、私たちの手法は、競技レベルのほとんどすべての計画タスクで現在の最先端の手法を上回ることができます。

English

Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality-a task hindered by the inherent ambiguity of natural language. To overcome such ambiguity, Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions. With PDDL, we can generate a symbolic world model where classic searching algorithms, such as A*, can be seamlessly applied to find optimal plans. However, directly generating PDDL domains with current LLMs remains an open challenge due to the lack of PDDL training data. To address this challenge, we propose to scale up the test-time computation of LLMs to enhance their PDDL reasoning capabilities, thereby enabling the generation of high-quality PDDL domains. Specifically, we introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution and then refines the solution in a fine-grained manner with verbalized machine learning. Our method outperforms o1-mini by a considerable margin in the generation of PDDL domain, achieving over 50% success rate on two tasks (i.e., generating PDDL domains from natural language description or PDDL problems). This is done without requiring additional training. By taking advantage of PDDL as state abstraction, our method is able to outperform current state-of-the-art methods on almost all competition-level planning tasks.

大規模言語モデルのテスト時スケーリングを用いたシンボリック世界モデルの生成

Generating Symbolic World Models via Test-time Scaling of Large Language Models

要旨

Support