ロボット操作ポリシーのための行動空間設計の解明

要旨

動作空間の設計は、模倣に基づくロボット把持ポリシー学習において極めて重要な役割を担い、ポリシー学習の最適化環境を根本的に形成する。近年の進歩は主に学習データの大規模化やモデル容量の拡大に焦点が当てられてきたが、動作空間の選択は依然としてアドホックな経験則や従来の設計に基づいて行われており、ロボットポリシー設計の理念に関する理解が不十分である。この問題を解決するため、我々は大規模かつ体系的な実証研究を実施し、動作空間がロボットポリシー学習に有意義かつ複雑な影響を及ぼすことを確認した。時間軸と空間軸に沿って動作設計空間を分析することで、これらの選択がポリシーの学習可能性と制御安定性にどのように影響するかの構造的な分析を可能にした。両腕ロボットを用いた13,000回以上の実世界での動作実験と、4つのシナリオにわたる500以上の学習済みモデルの評価に基づき、絶対表現と差分表現、関節空間とタスク空間のパラメータ化のトレードオフを検証した。大規模実験の結果から、ポリシーに差分動作の予測をさせる設計が一貫して性能向上に寄与すること、関節空間表現とタスク空間表現にはそれぞれ制御安定性と一般化に優れるという相補的な利点があることが示唆された。

English

The specification of the action space plays a pivotal role in imitation-based robotic manipulation policy learning, fundamentally shaping the optimization landscape of policy learning. While recent advances have focused heavily on scaling training data and model capacity, the choice of action space remains guided by ad-hoc heuristics or legacy designs, leading to an ambiguous understanding of robotic policy design philosophies. To address this ambiguity, we conducted a large-scale and systematic empirical study, confirming that the action space does have significant and complex impacts on robotic policy learning. We dissect the action design space along temporal and spatial axes, facilitating a structured analysis of how these choices govern both policy learnability and control stability. Based on 13,000+ real-world rollouts on a bimanual robot and evaluation on 500+ trained models over four scenarios, we examine the trade-offs between absolute vs. delta representations, and joint-space vs. task-space parameterizations. Our large-scale results suggest that properly designing the policy to predict delta actions consistently improves performance, while joint-space and task-space representations offer complementary strengths, favoring control stability and generalization, respectively.

ロボット操作ポリシーのための行動空間設計の解明

Demystifying Action Space Design for Robotic Manipulation Policies

要旨

Support