零樣本模擬至真實機器人學習：一項針對反應式接捕的靈巧操作研究

摘要

靈巧操作高度依賴物理，且對建模誤差與感知雜訊極為敏感，使得模擬到真實的遷移極具挑戰性。域隨機化（DR）常用來提升此類任務中學習策略的穩健性，但傳統的DR方法每回合僅隨機化一個實例，導致對於真實世界動態變異性的曝露非常有限。為此，我們提出域隨機化實例集（DRIS），該方法同時表示並傳播一組隨機化的實例，能提供對不確定動態更豐富的近似，並使策略能學習考慮多種可能結果的動作。在理論分析的支持下，我們證明DRIS能產生更穩健的策略，且即便使用數量不多的實例（例如10個），也能減少對真實世界微調的需求。我們在一個具挑戰性的反應式捕捉任務中驗證了這一點。與傳統捕捉裝置使用設計為機械穩定物體的末端執行器（例如曲面或包覆表面）不同，我們的系統採用平板，無法提供被動穩定性，使得該任務對雜訊高度敏感，且需要快速的反應動作。所學習的策略展現出對不確定性的強大穩健性，並實現了可靠的零樣本模擬到真實遷移。

English

Dexterous manipulation is physics-intensive and highly sensitive to modeling errors and perception noise, making sim-to-real transfer prohibitively challenging. Domain randomization (DR) is commonly used to improve the robustness of learned policies for such tasks, but conventional DR randomizes one instance per episode, offering very limited exposure to the variability of real-world dynamics. To this end, we propose Domain-Randomized Instance Set (DRIS), which represents and propagates a set of randomized instances simultaneously, providing richer approximation of uncertain dynamics and enabling policies to learn actions that account for multiple possible outcomes. Supported by theoretical analysis, we show that DRIS yields more robust policies and alleviates the need for real-world fine-tuning, even with a modest number of instances (e.g., 10). We demonstrate this on a challenging reactive catching task. Unlike traditional catching setups that use end-effectors designed to mechanically stabilize the object (e.g., curved or enclosing surfaces), our system uses a flat plate that offers no passive stabilization, making the task highly sensitive to noise and requiring rapid reactive motions. The learned policies exhibit strong robustness to uncertainties and achieve reliable zero-shot sim-to-real transfer.