SPIN: 産業タスクのための反復ナビゲーションによる構造的LLM計画

要旨

産業用LLMエージェントシステムでは、多くの場合、計画と実行が分離されているが、LLMプランナーは構造的に無効または不必要に長いワークフローを頻繁に生成し、脆い障害や回避可能なツール・APIコストを引き起こす。我々はSPINを提案する。これは、検証済み有向非巡回グラフ（DAG）計画とプレフィックスベースの実行制御を組み合わせた計画ラッパーである。SPINは、`_validate_plan_text`と修復プロンプトを通じて厳格なDAG契約を強制し、下流の実行前に実行可能な計画を生成した後、DAGのプレフィックスを段階的に評価し、現在のプレフィックスがクエリに答えるのに十分な場合に停止する。AssetOpsBenchでは、261シナリオにおいて、SPINは実行タスク数を1061から623に削減し、Accomplishedスコアを0.638から0.706に向上させ、実行あたりのツール呼び出し数を11.81から6.82に削減した。MCP Benchでは、同ラッパーがGPT OSS1およびLlama 4 Maverickの両方において、計画、根拠付け、依存関係関連のスコアを改善した。

English

Industrial LLM agent systems often separate planning from execution, yet LLM planners frequently produce structurally invalid or unnecessarily long workflows, leading to brittle failures and avoidable tool and API cost. We propose SPIN, a planning wrapper that combines validated Directed Acyclic Graph (DAG) planning with prefix based execution control. SPIN enforces a strict DAG contract through \_validate\_plan\_text and repair prompting, producing executable plans before downstream execution, and then evaluates DAG prefixes incrementally to stop when the current prefix is sufficient to answer the query. On AssetOpsBench, across 261 scenarios, SPIN reduces executed tasks from 1061 to 623 and improves Accomplished from 0.638 to 0.706, while reducing tool calls from 11.81 to 6.82 per run. On MCP Bench, the same wrapper improves planning, grounding, and dependency related scores for both GPT OSS1 and Llama 4 Maverick.