揭秘机器人操作策略的动作空间设计

摘要

动作空间的设定在基于模仿的机器人操作策略学习中具有决定性作用，它从根本上塑造了策略学习的优化空间。尽管近期研究重点集中在扩展训练数据和模型容量上，动作空间的选择仍依赖于经验性启发或传统设计，导致对机器人策略设计理念的理解存在模糊性。为厘清这一模糊地带，我们开展了大规模系统性实证研究，证实动作空间确实对机器人策略学习存在显著且复杂的影响。我们沿时间和空间维度解构动作设计空间，从而结构化分析这些选择如何影响策略可学习性和控制稳定性。通过在双手机器人上执行13,000余次实体部署实验，并对四类场景中500多个训练模型进行评估，我们对比了绝对动作与增量动作表征、关节空间与任务空间参数化的优劣。大规模实验结果表明：采用预测增量动作的策略设计能持续提升性能，而关节空间与任务空间表征具有互补优势——前者利于控制稳定性，后者则更有利于泛化能力。

English

The specification of the action space plays a pivotal role in imitation-based robotic manipulation policy learning, fundamentally shaping the optimization landscape of policy learning. While recent advances have focused heavily on scaling training data and model capacity, the choice of action space remains guided by ad-hoc heuristics or legacy designs, leading to an ambiguous understanding of robotic policy design philosophies. To address this ambiguity, we conducted a large-scale and systematic empirical study, confirming that the action space does have significant and complex impacts on robotic policy learning. We dissect the action design space along temporal and spatial axes, facilitating a structured analysis of how these choices govern both policy learnability and control stability. Based on 13,000+ real-world rollouts on a bimanual robot and evaluation on 500+ trained models over four scenarios, we examine the trade-offs between absolute vs. delta representations, and joint-space vs. task-space parameterizations. Our large-scale results suggest that properly designing the policy to predict delta actions consistently improves performance, while joint-space and task-space representations offer complementary strengths, favoring control stability and generalization, respectively.