MCP-Cosmos：基于世界模型增强的智能体在MCP环境中的复杂任务执行

摘要

模型上下文协议（MCP）已统一了大语言模型（LLM）与外部工具之间的接口，但在智能体对其操作环境的认知方式上仍存在根本性缺口。当前范式呈现二分状态：任务级规划往往忽略执行时的动态变化，而反应式执行则缺乏长期前瞻能力。我们提出MCP-Cosmos框架，该框架将生成式世界模型（WM）融入MCP生态系统，实现预测性任务自动化。通过统一MCP、世界模型和智能体三项不同技术，我们证明了"自带世界模型"（BYOWM）策略可使智能体在潜在空间中模拟状态转换，并在执行前优化计划。我们采用ReAct和SPIRAL两种策略，结合2个规划模型与3个代表性世界模型，在20余项MCP基准任务上开展实验。观察到智能体在环境交互关键绩效指标（如工具成功率与工具参数准确率）上有所提升。该框架还提供了执行质量等新评价指标，相比基线方法，能够对世界模型的有效性产生新的洞察。

English

The Model Context Protocol (MCP) has unified the interface between Large Language Models (LLMs) and external tools, yet a fundamental gap remains in how agents conceptualize the environments within which they operate. Current paradigms are bifurcated: Task-level planning often ignores execution-time dynamics, while reactive execution lacks long-horizon foresight. We present MCP-Cosmos, a framework that infuses generative World Models (WM) into the MCP ecosystem to enable predictive task automation. By unifying three disparate technologies, namely MCP, World Model, and Agent, we demonstrate that a "Bring Your Own World Model" (BYOWM) strategy allows agents to simulate state transitions and refine plans in a latent space before execution. We conducted experiments using two strategies, namely ReAct and SPIRAL with 2 planning models and 3 representative world models over 20+ MCP-Bench tasks. We observed improvements in Agent's environment interaction KPI such as tool success rate and tool parameter accuracy. The framework also offers new metrics such as Execution Quality to generate new insights about the effectiveness of world models compared to baseline.

MCP-Cosmos：基于世界模型增强的智能体在MCP环境中的复杂任务执行

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

摘要

Support