MCP-Cosmos：世界模型增強型代理在MCP環境中的複雜任務執行

摘要

模型上下文協議（MCP）已統一大型語言模型（LLMs）與外部工具之間的介面，然而在代理人如何概念化其運作環境方面，仍存在根本性缺口。當前典範呈現二分狀態：任務層級規劃常忽略執行時的動態變化，而反應式執行則缺乏長程前瞻能力。我們提出MCP-Cosmos框架，將生成式世界模型（World Model, WM）注入MCP生態系統，以實現預測性任務自動化。透過統合MCP、世界模型與代理人這三項迥異的技術，我們證明「自備世界模型」（Bring Your Own World Model, BYOWM）策略能讓代理人在執行前於潛在空間中模擬狀態轉移並優化計畫。我們採用ReAct與SPIRAL兩種策略，搭配2種規劃模型及3種具代表性的世界模型，在20多項MCP-Bench任務上進行實驗。觀察到代理人與環境互動的關鍵績效指標（如工具成功率與工具參數準確度）有所提升。該框架亦提供如「執行品質」等新指標，相較於基線方法，能產出關於世界模型有效性的新洞察。

English

The Model Context Protocol (MCP) has unified the interface between Large Language Models (LLMs) and external tools, yet a fundamental gap remains in how agents conceptualize the environments within which they operate. Current paradigms are bifurcated: Task-level planning often ignores execution-time dynamics, while reactive execution lacks long-horizon foresight. We present MCP-Cosmos, a framework that infuses generative World Models (WM) into the MCP ecosystem to enable predictive task automation. By unifying three disparate technologies, namely MCP, World Model, and Agent, we demonstrate that a "Bring Your Own World Model" (BYOWM) strategy allows agents to simulate state transitions and refine plans in a latent space before execution. We conducted experiments using two strategies, namely ReAct and SPIRAL with 2 planning models and 3 representative world models over 20+ MCP-Bench tasks. We observed improvements in Agent's environment interaction KPI such as tool success rate and tool parameter accuracy. The framework also offers new metrics such as Execution Quality to generate new insights about the effectiveness of world models compared to baseline.

MCP-Cosmos：世界模型增強型代理在MCP環境中的複雜任務執行

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

摘要

Support