ScanBot:面向具身机器人系统的智能表面扫描技术
ScanBot: Towards Intelligent Surface Scanning in Embodied Robotic Systems
May 22, 2025
作者: Zhiling Chen, Yang Zhang, Fardin Jalil Piran, Qianyu Zhou, Jiong Tang, Farhad Imani
cs.AI
摘要
我们推出ScanBot,这是一个专为机器人系统中指令驱动的高精度表面扫描而设计的新型数据集。与现有专注于抓取、导航或对话等粗略任务的机器人学习数据集不同,ScanBot瞄准了工业激光扫描对高精度的严苛要求,其中亚毫米级的路径连续性和参数稳定性至关重要。该数据集涵盖了机器人对12种不同物体执行的激光扫描轨迹,涉及6种任务类型,包括全表面扫描、几何重点区域、空间参考部件、功能相关结构、缺陷检测及对比分析。每次扫描均以自然语言指令为引导,并同步记录RGB图像、深度信息、激光轮廓,以及机器人姿态和关节状态。尽管近期有所进展,现有的视觉语言动作(VLA)模型在细粒度指令和现实世界精度要求下仍难以生成稳定的扫描轨迹。为探究这一局限,我们在一系列多模态大语言模型(MLLMs)上进行了从感知到规划再到执行的全流程基准测试,揭示了在现实约束条件下指令跟随的持续挑战。
English
We introduce ScanBot, a novel dataset designed for instruction-conditioned,
high-precision surface scanning in robotic systems. In contrast to existing
robot learning datasets that focus on coarse tasks such as grasping,
navigation, or dialogue, ScanBot targets the high-precision demands of
industrial laser scanning, where sub-millimeter path continuity and parameter
stability are critical. The dataset covers laser scanning trajectories executed
by a robot across 12 diverse objects and 6 task types, including full-surface
scans, geometry-focused regions, spatially referenced parts, functionally
relevant structures, defect inspection, and comparative analysis. Each scan is
guided by natural language instructions and paired with synchronized RGB,
depth, and laser profiles, as well as robot pose and joint states. Despite
recent progress, existing vision-language action (VLA) models still fail to
generate stable scanning trajectories under fine-grained instructions and
real-world precision demands. To investigate this limitation, we benchmark a
range of multimodal large language models (MLLMs) across the full
perception-planning-execution loop, revealing persistent challenges in
instruction-following under realistic constraints.Summary
AI-Generated Summary