經典規劃與LLM生成啟發式:以Python代碼挑戰現有技術水平
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code
March 24, 2025
作者: Augusto B. Corrêa, André G. Pereira, Jendrik Seipp
cs.AI
摘要
近年來,大型語言模型(LLMs)在各種人工智慧問題上展現了卓越的能力。然而,即使在提供了詳細的規劃任務定義的情況下,它們仍無法可靠地進行規劃。嘗試提升其規劃能力的各種方法,如思維鏈提示、微調和顯式“推理”,仍然會產生錯誤的計劃,並且通常無法推廣到更大的任務。在本文中,我們展示了如何利用LLMs生成正確的計劃,即使是對於分佈外且規模不斷增大的任務。對於給定的規劃領域,我們要求LLM生成多個領域依賴的啟發式函數,這些函數以Python代碼的形式呈現,並在一組訓練任務上通過貪婪最佳優先搜索進行評估,從而選擇最強的一個。由此產生的LLM生成的啟發式函數解決了比經典規劃領域中現有最先進的領域無關啟發式函數更多的未見測試任務。它們甚至與最強的領域依賴規劃學習算法相媲美。這些發現尤其引人注目,因為我們的概念驗證實現基於未經優化的Python規劃器,而所有基線都建立在高度優化的C++代碼之上。在某些領域中,LLM生成的啟發式函數擴展的狀態數比基線更少,這表明它們不僅計算效率高,有時甚至比最先進的啟發式函數更具信息量。總體而言,我們的結果表明,採樣一組規劃啟發式函數程序可以顯著提升LLMs的規劃能力。
English
In recent years, large language models (LLMs) have shown remarkable
capabilities in various artificial intelligence problems. However, they fail to
plan reliably, even when prompted with a detailed definition of the planning
task. Attempts to improve their planning capabilities, such as chain-of-thought
prompting, fine-tuning, and explicit "reasoning" still yield incorrect plans
and usually fail to generalize to larger tasks. In this paper, we show how to
use LLMs to generate correct plans, even for out-of-distribution tasks of
increasing size. For a given planning domain, we ask an LLM to generate several
domain-dependent heuristic functions in the form of Python code, evaluate them
on a set of training tasks within a greedy best-first search, and choose the
strongest one. The resulting LLM-generated heuristics solve many more unseen
test tasks than state-of-the-art domain-independent heuristics for classical
planning. They are even competitive with the strongest learning algorithm for
domain-dependent planning. These findings are especially remarkable given that
our proof-of-concept implementation is based on an unoptimized Python planner
and the baselines all build upon highly optimized C++ code. In some domains,
the LLM-generated heuristics expand fewer states than the baselines, revealing
that they are not only efficiently computable, but sometimes even more
informative than the state-of-the-art heuristics. Overall, our results show
that sampling a set of planning heuristic function programs can significantly
improve the planning capabilities of LLMs.Summary
AI-Generated Summary