ゼロショットSim-to-Realロボット学習：反応的キャッチングにおける巧みな操作の研究

要旨

器用な操作は物理的負荷が大きく、モデリング誤差や知覚ノイズに対する感度が高いため、シミュレーションから実機への転移（sim-to-real transfer）が極めて困難である。ドメインランダム化（DR）は、このようなタスクにおいて学習方策のロバスト性を向上させるために一般的に用いられるが、従来のDRはエピソードごとに1つのインスタンスをランダム化するだけであり、現実世界のダイナミクスの変動性に対する露出が非常に限られている。この問題に対処するため、我々はドメインランダム化インスタンスセット（DRIS）を提案する。DRISはランダム化された複数のインスタンスを同時に表現・伝搬することで、不確かなダイナミクスに対するより豊かな近似を提供し、複数の可能な結果を考慮した行動を学習する方策を可能にする。理論的解析に裏付けられ、DRISは少数のインスタンス（例えば10個）であっても、よりロバストな方策をもたらし、実機での微調整の必要性を軽減することを示す。我々はこれを、困難なリアクティブキャッチングタスクで実証する。従来のキャッチング設定では、物体を機械的に安定化するように設計されたエンドエフェクタ（例えば、曲面や包み込む形状）が用いられるのに対し、我々のシステムは受動的安定化を提供しない平板を使用しており、このタスクはノイズに非常に敏感で、迅速なリアクティブ動作を必要とする。学習された方策は、不確かさに対して強いロバスト性を示し、信頼性の高いゼロショットのシミュレーションから実機への転移を達成する。

English

Dexterous manipulation is physics-intensive and highly sensitive to modeling errors and perception noise, making sim-to-real transfer prohibitively challenging. Domain randomization (DR) is commonly used to improve the robustness of learned policies for such tasks, but conventional DR randomizes one instance per episode, offering very limited exposure to the variability of real-world dynamics. To this end, we propose Domain-Randomized Instance Set (DRIS), which represents and propagates a set of randomized instances simultaneously, providing richer approximation of uncertain dynamics and enabling policies to learn actions that account for multiple possible outcomes. Supported by theoretical analysis, we show that DRIS yields more robust policies and alleviates the need for real-world fine-tuning, even with a modest number of instances (e.g., 10). We demonstrate this on a challenging reactive catching task. Unlike traditional catching setups that use end-effectors designed to mechanically stabilize the object (e.g., curved or enclosing surfaces), our system uses a flat plate that offers no passive stabilization, making the task highly sensitive to noise and requiring rapid reactive motions. The learned policies exhibit strong robustness to uncertainties and achieve reliable zero-shot sim-to-real transfer.