MCP-Cosmos: 世界モデル拡張エージェントによるMCP環境における複雑なタスク実行

要旨

モデルコンテキストプロトコル（MCP）は、大規模言語モデル（LLM）と外部ツールとのインターフェースを統一したが、エージェントが自らの動作環境をどのように概念化するかという根本的な課題は依然として残っている。現在のパラダイムは二極化している。タスクレベルの計画は実行時の動的変化を無視することが多く、一方、リアクティブな実行は長期的な先見性を欠いている。本稿では、生成ワールドモデル（WM）をMCPエコシステムに注入し、予測的なタスク自動化を実現するフレームワーク「MCP-Cosmos」を提案する。MCP、ワールドモデル、エージェントという3つの異種技術を統合することで、「BYOWM（Bring Your Own World Model）」戦略により、エージェントが実行前に潜在空間内で状態遷移をシミュレーションし、計画を洗練できることを実証する。我々は、ReActおよびSPIRALという2つの戦略を、2つの計画モデルと3つの代表的なワールドモデルを用いて、20以上のMCP-Benchタスクで実験を行った。その結果、ツール成功率やツールパラメータ精度といったエージェントの環境相互作用KPIにおいて改善が観察された。また、本フレームワークは実行品質といった新たな指標を提供し、ベースラインと比較したワールドモデルの有効性に関する新たな知見を生み出す。

English

The Model Context Protocol (MCP) has unified the interface between Large Language Models (LLMs) and external tools, yet a fundamental gap remains in how agents conceptualize the environments within which they operate. Current paradigms are bifurcated: Task-level planning often ignores execution-time dynamics, while reactive execution lacks long-horizon foresight. We present MCP-Cosmos, a framework that infuses generative World Models (WM) into the MCP ecosystem to enable predictive task automation. By unifying three disparate technologies, namely MCP, World Model, and Agent, we demonstrate that a "Bring Your Own World Model" (BYOWM) strategy allows agents to simulate state transitions and refine plans in a latent space before execution. We conducted experiments using two strategies, namely ReAct and SPIRAL with 2 planning models and 3 representative world models over 20+ MCP-Bench tasks. We observed improvements in Agent's environment interaction KPI such as tool success rate and tool parameter accuracy. The framework also offers new metrics such as Execution Quality to generate new insights about the effectiveness of world models compared to baseline.

MCP-Cosmos: 世界モデル拡張エージェントによるMCP環境における複雑なタスク実行

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

要旨

Support