机器人挑战赛:具身策略的大规模实体机器人评估
RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies
October 20, 2025
作者: Adina Yakefu, Bin Xie, Chongyang Xu, Enwen Zhang, Erjin Zhou, Fan Jia, Haitao Yang, Haoqiang Fan, Haowei Zhang, Hongyang Peng, Jing Tan, Junwen Huang, Kai Liu, Kaixin Liu, Kefan Gu, Qinglun Zhang, Ruitao Zhang, Saike Huang, Shen Cheng, Shuaicheng Liu, Tiancai Wang, Tiezhen Wang, Wei Sun, Wenbin Tang, Yajun Wei, Yang Chen, Youqiang Gui, Yucheng Zhao, Yunchao Ma, Yunfei Wei, Yunhuan Yang, Yutong Guo, Ze Chen, Zhengyuan Du, Ziheng Zhang, Ziming Liu, Ziwei Yan
cs.AI
摘要
机器人控制算法的真实机器测试不可或缺。对于基于学习的算法,特别是视觉语言动作模型而言,大规模评估(即在大量任务上测试大量模型)的需求日益迫切。然而要实现可靠评估并非易事,尤其在考虑可扩展性和可复现性时。本报告阐述了构建RoboChallenge在线评估系统的方法论,该系统用于测试机器人控制算法,同时通过我们初步构建的Table30基准对当前最先进的VLA模型进行了调研分析。
English
Testing on real machines is indispensable for robotic control algorithms. In
the context of learning-based algorithms, especially VLA models, demand for
large-scale evaluation, i.e. testing a large number of models on a large number
of tasks, is becoming increasingly urgent. However, doing this right is highly
non-trivial, especially when scalability and reproducibility is taken into
account. In this report, we describe our methodology for constructing
RoboChallenge, an online evaluation system to test robotic control algorithms,
and our survey of recent state-of-the-art VLA models using our initial
benchmark Table30.