arXiv: 2511.03497v1

ROSBag MCP服务器:利用大语言模型分析机器人数据以赋能具身智能体应用

ROSBag MCP Server: Analyzing Robot Data with LLMs for Agentic Embodied AI Applications

November 5, 2025
作者: Lei Fu, Sahar Salimpour, Leonardo Militano, Harry Edelman, Jorge Peña Queralta, Giovanni Toffetti
cs.ROcs.ROcs.AIcs.SE

摘要

代理型人工智能系统与物理或具身人工智能系统一直是人工智能与机器人技术领域的两大核心研究方向,其中模型上下文协议(MCP)日益成为代理应用的关键组成部分与推动力。然而,在这两大方向交汇处,即代理具身人工智能领域,相关文献仍显匮乏。本文介绍了一种用于分析ROS及ROS 2数据包的MCP服务器,它能够通过大型语言模型(LLMs)和视觉语言模型(VLMs)实现对机器人数据的分析、可视化及自然语言处理。我们详细阐述了基于机器人学领域知识构建的专用工具集,其初始版本专注于移动机器人领域,并原生支持对轨迹、激光扫描数据、变换矩阵或时间序列数据的分析。此外,该工具集还提供了与标准ROS 2命令行工具(如“ros2 bag list”或“ros2 bag info”)的接口,以及按主题子集或时间范围筛选数据包的功能。配合MCP服务器,我们提供了一款轻量级用户界面,便于使用不同LLMs(包括专有模型如Anthropic、OpenAI及开源模型通过Groq)对工具集进行基准测试。实验部分,我们评估了八种当前最先进的LLM/VLM模型(涵盖专有与开源、大型与小型)的工具调用能力。实验结果显示,各模型在工具调用能力上存在显著差异,其中Kimi K2与Claude Sonnet 4表现尤为突出。我们还发现,从工具描述模式到参数数量,再到模型可用的工具数量,多重因素影响着调用成功率。相关代码已以宽松许可证发布于https://github.com/binabik-ai/mcp-rosbags。
English
Agentic AI systems and Physical or Embodied AI systems have been two key research verticals at the forefront of Artificial Intelligence and Robotics, with Model Context Protocol (MCP) increasingly becoming a key component and enabler of agentic applications. However, the literature at the intersection of these verticals, i.e., Agentic Embodied AI, remains scarce. This paper introduces an MCP server for analyzing ROS and ROS 2 bags, allowing for analyzing, visualizing and processing robot data with natural language through LLMs and VLMs. We describe specific tooling built with robotics domain knowledge, with our initial release focused on mobile robotics and supporting natively the analysis of trajectories, laser scan data, transforms, or time series data. This is in addition to providing an interface to standard ROS 2 CLI tools ("ros2 bag list" or "ros2 bag info"), as well as the ability to filter bags with a subset of topics or trimmed in time. Coupled with the MCP server, we provide a lightweight UI that allows the benchmarking of the tooling with different LLMs, both proprietary (Anthropic, OpenAI) and open-source (through Groq). Our experimental results include the analysis of tool calling capabilities of eight different state-of-the-art LLM/VLM models, both proprietary and open-source, large and small. Our experiments indicate that there is a large divide in tool calling capabilities, with Kimi K2 and Claude Sonnet 4 demonstrating clearly superior performance. We also conclude that there are multiple factors affecting the success rates, from the tool description schema to the number of arguments, as well as the number of tools available to the models. The code is available with a permissive license at https://github.com/binabik-ai/mcp-rosbags.
PDFNovember 6, 2025