通過深度強化學習實現真實世界流體引導剛體控制
Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning
February 8, 2024
作者: Mohak Bhardwaj, Thomas Lampe, Michael Neunert, Francesco Romano, Abbas Abdolmaleki, Arunkumar Byravan, Markus Wulfmeier, Martin Riedmiller, Jonas Buchli
cs.AI
摘要
最近在強化學習(RL)在現實世界應用方面取得的進展,依賴於能夠準確模擬大規模系統的能力。然而,諸如流體動力系統等領域展現出複雜的動態現象,很難以高整合率進行模擬,這限制了現代深度RL算法直接應用於常常昂貴或安全關鍵硬體上的可能性。在這項工作中,我們介紹了一個名為「Box o Flows」的新型臨櫃實驗控制系統,用於系統性地評估RL算法在動態現實世界場景中的表現。我們描述了Box o Flows的關鍵組件,並通過一系列實驗展示了最先進的無模型RL算法如何通過簡單的獎勵規範合成各種複雜行為。此外,我們探討了離線RL在資料高效假設測試中的作用,通過重複使用過去的經驗。我們相信,從這項初步研究中獲得的見解以及像Box o Flows這樣的系統的可用性,將支持發展出可以普遍應用於複雜、動態系統的系統性RL算法的未來方向。附加資料和實驗視頻可在https://sites.google.com/view/box-o-flows/home 上找到。
English
Recent advances in real-world applications of reinforcement learning (RL)
have relied on the ability to accurately simulate systems at scale. However,
domains such as fluid dynamical systems exhibit complex dynamic phenomena that
are hard to simulate at high integration rates, limiting the direct application
of modern deep RL algorithms to often expensive or safety critical hardware. In
this work, we introduce "Box o Flows", a novel benchtop experimental control
system for systematically evaluating RL algorithms in dynamic real-world
scenarios. We describe the key components of the Box o Flows, and through a
series of experiments demonstrate how state-of-the-art model-free RL algorithms
can synthesize a variety of complex behaviors via simple reward specifications.
Furthermore, we explore the role of offline RL in data-efficient hypothesis
testing by reusing past experiences. We believe that the insights gained from
this preliminary study and the availability of systems like the Box o Flows
support the way forward for developing systematic RL algorithms that can be
generally applied to complex, dynamical systems. Supplementary material and
videos of experiments are available at
https://sites.google.com/view/box-o-flows/home.