DynaGuide：利用主動動態引導調控擴散策略

摘要

在現實世界中部署大型且複雜的策略，需要具備根據情境需求調整策略的能力。最常見的調整方法，如目標條件化，通常需要在訓練機器人策略時考慮測試時目標的分佈。為克服這一限制，我們提出了DynaGuide，這是一種在擴散去噪過程中利用外部動力學模型進行指導的策略調整方法。DynaGuide將動力學模型與基礎策略分離，這賦予了它多項優勢，包括能夠朝向多個目標進行調整、增強基礎策略中表現不足的行為，以及在低質量目標下保持穩健性。獨立的指導信號還使得DynaGuide能夠與現成的預訓練擴散策略協同工作。我們通過一系列模擬和真實實驗，展示了DynaGuide相較於其他調整方法的性能和特點，在一組CALVIN關節任務中達到了70%的平均調整成功率，並在低質量目標指導下，其表現優於目標條件化方法5.4倍。此外，我們還成功調整了一款現成的真實機器人策略，使其表現出對特定物體的偏好，甚至創造了新穎的行為。更多視頻和資訊可訪問項目網站：https://dynaguide.github.io。

English

Deploying large, complex policies in the real world requires the ability to steer them to fit the needs of a situation. Most common steering approaches, like goal-conditioning, require training the robot policy with a distribution of test-time objectives in mind. To overcome this limitation, we present DynaGuide, a steering method for diffusion policies using guidance from an external dynamics model during the diffusion denoising process. DynaGuide separates the dynamics model from the base policy, which gives it multiple advantages, including the ability to steer towards multiple objectives, enhance underrepresented base policy behaviors, and maintain robustness on low-quality objectives. The separate guidance signal also allows DynaGuide to work with off-the-shelf pretrained diffusion policies. We demonstrate the performance and features of DynaGuide against other steering approaches in a series of simulated and real experiments, showing an average steering success of 70% on a set of articulated CALVIN tasks and outperforming goal-conditioning by 5.4x when steered with low-quality objectives. We also successfully steer an off-the-shelf real robot policy to express preference for particular objects and even create novel behavior. Videos and more can be found on the project website: https://dynaguide.github.io

DynaGuide：利用主動動態引導調控擴散策略

DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance

摘要

Support