零样本从仿真到现实的机器人学习：反应式抓取中的灵巧操作研究

摘要

灵巧操作高度依赖物理特性，且对建模误差和感知噪声极为敏感，这使得从仿真到真实的迁移极具挑战性。域随机化（DR）通常用于增强所学策略对此类任务的鲁棒性，但传统DR方法每次仅随机化单一实例，对真实世界动态变化的覆盖十分有限。为此，我们提出域随机化实例集（DRIS），该方法同时表达并传播一组随机化实例，从而提供对不确定动力学更丰富的近似，使策略能够学习考虑多种可能结果的行动。在理论分析支撑下，我们证明即便使用较少实例（如10个），DRIS也能生成更鲁棒的策略，并消除对真实场景微调的需求。我们通过一项具有挑战性的反应式接物任务验证了该方法。与传统接物设置中采用曲面或封闭表面等机械方式稳定目标物体的末端执行器不同，我们的系统使用平板结构，无法提供被动稳定，这使得任务对噪声高度敏感且需要快速反应动作。所学策略对不确定性展现出极强的鲁棒性，并实现了可靠的零样本仿真到真实迁移。

English

Dexterous manipulation is physics-intensive and highly sensitive to modeling errors and perception noise, making sim-to-real transfer prohibitively challenging. Domain randomization (DR) is commonly used to improve the robustness of learned policies for such tasks, but conventional DR randomizes one instance per episode, offering very limited exposure to the variability of real-world dynamics. To this end, we propose Domain-Randomized Instance Set (DRIS), which represents and propagates a set of randomized instances simultaneously, providing richer approximation of uncertain dynamics and enabling policies to learn actions that account for multiple possible outcomes. Supported by theoretical analysis, we show that DRIS yields more robust policies and alleviates the need for real-world fine-tuning, even with a modest number of instances (e.g., 10). We demonstrate this on a challenging reactive catching task. Unlike traditional catching setups that use end-effectors designed to mechanically stabilize the object (e.g., curved or enclosing surfaces), our system uses a flat plate that offers no passive stabilization, making the task highly sensitive to noise and requiring rapid reactive motions. The learned policies exhibit strong robustness to uncertainties and achieve reliable zero-shot sim-to-real transfer.