SPIN: 산업 작업을 위한 반복적 탐색 기반 구조적 LLM 계획

초록

산업용 LLM 에이전트 시스템은 종종 계획 수립과 실행을 분리하지만, LLM 계획자는 구조적으로 유효하지 않거나 불필요하게 긴 워크플로우를 자주 생성하여 취약한 실패와 피할 수 있는 도구 및 API 비용을 초래한다. 본 논문에서는 검증된 방향성 비순환 그래프(DAG) 계획과 접두사 기반 실행 제어를 결합한 계획 래퍼인 SPIN을 제안한다. SPIN은 `_validate_plan_text` 및 수정 프롬프팅을 통해 엄격한 DAG 계약을 적용하여 하위 실행 전에 실행 가능한 계획을 생성하고, 이후 DAG 접두사를 점진적으로 평가하여 현재 접두사가 질의에 응답하기에 충분한 시점에서 실행을 중단한다. AssetOpsBench의 261개 시나리오에서 SPIN은 실행된 작업 수를 1061개에서 623개로 줄이고 Accomplished를 0.638에서 0.706으로 개선했으며, 실행당 도구 호출 수를 11.81회에서 6.82회로 감소시켰다. MCP Bench에서는 동일한 래퍼가 GPT OSS1과 Llama 4 Maverick 모두에 대해 계획, 근거 및 종속성 관련 점수를 향상시켰다.

English

Industrial LLM agent systems often separate planning from execution, yet LLM planners frequently produce structurally invalid or unnecessarily long workflows, leading to brittle failures and avoidable tool and API cost. We propose SPIN, a planning wrapper that combines validated Directed Acyclic Graph (DAG) planning with prefix based execution control. SPIN enforces a strict DAG contract through \_validate\_plan\_text and repair prompting, producing executable plans before downstream execution, and then evaluates DAG prefixes incrementally to stop when the current prefix is sufficient to answer the query. On AssetOpsBench, across 261 scenarios, SPIN reduces executed tasks from 1061 to 623 and improves Accomplished from 0.638 to 0.706, while reducing tool calls from 11.81 to 6.82 per run. On MCP Bench, the same wrapper improves planning, grounding, and dependency related scores for both GPT OSS1 and Llama 4 Maverick.