評估代理式規劃執行管線中的時序語意快取與工作流程最佳化
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines
May 20, 2026
作者: Alimurtaza Mustafa Merchant, Krish Veera, Sajal Kumar Goyla, Shambhawi Bhure, Dhaval Patel, Kaoutar El Maghraoui
cs.AI
摘要
工業資產運營工作流程對延遲敏感,因為單一用戶查詢可能需要協調感測器數據、工單、故障模式、預測工具和特定領域代理。我們在 AssetOpsBench(AOB)上評估此問題,這是一個工業代理基準測試,其規劃-執行管道會因工具發現、LLM 規劃、MCP 工具執行和最終摘要而反覆產生開銷。現有的 LLM 快取技術(如 KV 快取重用和基於嵌入的語義快取)是為聊天機器人服務設計的,當輸出有效性取決於時間、資產或感測器參數時,這些技術會失效。我們針對 AOB 的規劃-執行管道提出兩種互補的優化層:時間語義快取和一組 MCP 工作流程優化,結合了磁碟支援的工具發現快取和依賴感知的並行步驟執行。MCP 工作流程優化帶來 1.67 倍的加速,並將中位端到端延遲降低約 40.0%,而時間快取基準測試在快取命中時實現了中位 30.6 倍的加速。除了加速效果外,我們的結果還揭示了純語義快取在參數豐富的工業查詢中的具體失敗模式,提供了對快取選擇如何影響 MCP 支援的代理基準測試評估正確性的關鍵分析。
English
Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations corresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.