SPIN:工業任務中基於迭代導航的結構化LLM規劃
SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks
May 13, 2026
作者: Yusuke Ozaki, Dhaval Patel
cs.AI
摘要
工業LLM代理系統通常將規劃與執行分離,然而LLM規劃器經常產生結構無效或過於冗長的工作流程,導致脆弱的失敗與可避免的工具及API成本。我們提出SPIN,這是一個規劃封裝器,結合了經過驗證的有向無環圖(DAG)規劃與基於前綴的執行控制。SPIN透過`_validate_plan_text`與修復提示強制執行嚴格的DAG約束,在下游執行前產出可執行的計劃,然後逐步評估DAG前綴,在當前前綴足以回答查詢時即停止執行。在AssetOpsBench的261個情境中,SPIN將執行的任務從1061個減少至623個,並將完成率(Accomplished)從0.638提升至0.706,同時每次運行的工具呼叫從11.81次降低至6.82次。在MCP Bench上,相同的封裝器對GPT OSS1與Llama 4 Maverick在規劃、基礎化與依賴相關的分數上均有改善。
English
Industrial LLM agent systems often separate planning from execution, yet LLM planners frequently produce structurally invalid or unnecessarily long workflows, leading to brittle failures and avoidable tool and API cost. We propose SPIN, a planning wrapper that combines validated Directed Acyclic Graph (DAG) planning with prefix based execution control. SPIN enforces a strict DAG contract through \_validate\_plan\_text and repair prompting, producing executable plans before downstream execution, and then evaluates DAG prefixes incrementally to stop when the current prefix is sufficient to answer the query. On AssetOpsBench, across 261 scenarios, SPIN reduces executed tasks from 1061 to 623 and improves Accomplished from 0.638 to 0.706, while reducing tool calls from 11.81 to 6.82 per run. On MCP Bench, the same wrapper improves planning, grounding, and dependency related scores for both GPT OSS1 and Llama 4 Maverick.