ScanBot：邁向具身機器人系統中的智能表面掃描

摘要

我們介紹了ScanBot，這是一個專為機器人系統中指令條件化、高精度表面掃描而設計的新穎數據集。與現有的機器人學習數據集相比，後者主要關注如抓取、導航或對話等粗粒度任務，ScanBot則針對工業激光掃描的高精度需求，其中亞毫米級路徑連續性和參數穩定性至關重要。該數據集涵蓋了機器人對12種不同物體執行的激光掃描軌跡，涉及6種任務類型，包括全表面掃描、幾何重點區域、空間參考部件、功能相關結構、缺陷檢測以及比較分析。每次掃描均由自然語言指令引導，並配合同步的RGB、深度和激光輪廓數據，以及機器人位姿和關節狀態。儘管近期有所進展，現有的視覺語言動作（VLA）模型在細粒度指令和現實世界精度要求下仍無法生成穩定的掃描軌跡。為探究這一限制，我們對一系列多模態大語言模型（MLLMs）在整個感知-規劃-執行迴路中進行了基準測試，揭示了在現實約束下指令跟隨方面存在的持續挑戰。

English

We introduce ScanBot, a novel dataset designed for instruction-conditioned, high-precision surface scanning in robotic systems. In contrast to existing robot learning datasets that focus on coarse tasks such as grasping, navigation, or dialogue, ScanBot targets the high-precision demands of industrial laser scanning, where sub-millimeter path continuity and parameter stability are critical. The dataset covers laser scanning trajectories executed by a robot across 12 diverse objects and 6 task types, including full-surface scans, geometry-focused regions, spatially referenced parts, functionally relevant structures, defect inspection, and comparative analysis. Each scan is guided by natural language instructions and paired with synchronized RGB, depth, and laser profiles, as well as robot pose and joint states. Despite recent progress, existing vision-language action (VLA) models still fail to generate stable scanning trajectories under fine-grained instructions and real-world precision demands. To investigate this limitation, we benchmark a range of multimodal large language models (MLLMs) across the full perception-planning-execution loop, revealing persistent challenges in instruction-following under realistic constraints.

ScanBot：邁向具身機器人系統中的智能表面掃描

ScanBot: Towards Intelligent Surface Scanning in Embodied Robotic Systems

摘要

Support