arXiv: 2511.03497v1

ROSBag MCP 伺服器:運用大型語言模型分析機器人數據以實現具身智能代理應用

ROSBag MCP Server: Analyzing Robot Data with LLMs for Agentic Embodied AI Applications

November 5, 2025
作者: Lei Fu, Sahar Salimpour, Leonardo Militano, Harry Edelman, Jorge Peña Queralta, Giovanni Toffetti
cs.ROcs.ROcs.AIcs.SE

摘要

代理型人工智能系統與物理或具身人工智能系統一直是人工智能與機器人技術領域中兩個關鍵的研究方向,而模型上下文協議(Model Context Protocol, MCP)正逐漸成為代理型應用的核心組件與推動力。然而,在這些研究方向的交叉領域,即代理型具身人工智能,相關文獻仍顯匱乏。本文介紹了一種用於分析ROS及ROS 2數據包的MCP服務器,該服務器能夠通過大型語言模型(LLMs)和視覺語言模型(VLMs)以自然語言分析、可視化及處理機器人數據。我們描述了基於機器人領域知識構建的特定工具,其初始版本專注於移動機器人,並原生支持軌跡、激光掃描數據、變換或時間序列數據的分析。此外,該工具還提供了與標準ROS 2命令行工具(如“ros2 bag list”或“ros2 bag info”)的接口,以及按主題子集或時間範圍過濾數據包的能力。配合MCP服務器,我們提供了一個輕量級用戶界面,允許使用不同的大型語言模型(包括專有模型如Anthropic、OpenAI及開源模型如通過Groq)對工具進行基準測試。我們的實驗結果涵蓋了對八種不同最先進的LLM/VLM模型(無論是專有還是開源,無論規模大小)工具調用能力的分析。實驗表明,在工具調用能力上存在顯著差異,其中Kimi K2與Claude Sonnet 4表現出明顯的優勢。我們還得出結論,影響成功率的因素多樣,從工具描述模式到參數數量,再到模型可用的工具數量均有涉及。代碼已以寬鬆許可證發布於https://github.com/binabik-ai/mcp-rosbags。
English
Agentic AI systems and Physical or Embodied AI systems have been two key research verticals at the forefront of Artificial Intelligence and Robotics, with Model Context Protocol (MCP) increasingly becoming a key component and enabler of agentic applications. However, the literature at the intersection of these verticals, i.e., Agentic Embodied AI, remains scarce. This paper introduces an MCP server for analyzing ROS and ROS 2 bags, allowing for analyzing, visualizing and processing robot data with natural language through LLMs and VLMs. We describe specific tooling built with robotics domain knowledge, with our initial release focused on mobile robotics and supporting natively the analysis of trajectories, laser scan data, transforms, or time series data. This is in addition to providing an interface to standard ROS 2 CLI tools ("ros2 bag list" or "ros2 bag info"), as well as the ability to filter bags with a subset of topics or trimmed in time. Coupled with the MCP server, we provide a lightweight UI that allows the benchmarking of the tooling with different LLMs, both proprietary (Anthropic, OpenAI) and open-source (through Groq). Our experimental results include the analysis of tool calling capabilities of eight different state-of-the-art LLM/VLM models, both proprietary and open-source, large and small. Our experiments indicate that there is a large divide in tool calling capabilities, with Kimi K2 and Claude Sonnet 4 demonstrating clearly superior performance. We also conclude that there are multiple factors affecting the success rates, from the tool description schema to the number of arguments, as well as the number of tools available to the models. The code is available with a permissive license at https://github.com/binabik-ai/mcp-rosbags.
PDFNovember 6, 2025