실제 세계 유체 환경에서의 강체 제어를 위한 심층 강화 학습

초록

최근 강화학습(RL)의 실제 응용 분야에서의 발전은 대규모 시스템을 정확하게 시뮬레이션할 수 있는 능력에 크게 의존해 왔습니다. 그러나 유체 역학 시스템과 같은 영역에서는 높은 통합 속도로 시뮬레이션하기 어려운 복잡한 동적 현상이 나타나며, 이는 현대의 심층 강화학습 알고리즘을 종종 비용이 많이 들거나 안전이 중요한 하드웨어에 직접 적용하는 데 제약을 가합니다. 본 연구에서는 동적인 실제 시나리오에서 강화학습 알고리즘을 체계적으로 평가하기 위한 새로운 벤치탑 실험 제어 시스템인 "Box o Flows"를 소개합니다. 우리는 Box o Flows의 주요 구성 요소를 설명하고, 일련의 실험을 통해 최신의 모델 없는 강화학습 알고리즘이 간단한 보상 명세를 통해 다양한 복잡한 행동을 합성할 수 있는 방법을 보여줍니다. 또한, 과거 경험을 재사용하여 데이터 효율적인 가설 검증에서 오프라인 강화학습의 역할을 탐구합니다. 우리는 이 예비 연구에서 얻은 통찰과 Box o Flows와 같은 시스템의 가용성이 복잡한 동적 시스템에 일반적으로 적용할 수 있는 체계적인 강화학습 알고리즘 개발을 위한 길을 지원할 것이라고 믿습니다. 보충 자료 및 실험 동영상은 https://sites.google.com/view/box-o-flows/home에서 확인할 수 있습니다.

English

Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard to simulate at high integration rates, limiting the direct application of modern deep RL algorithms to often expensive or safety critical hardware. In this work, we introduce "Box o Flows", a novel benchtop experimental control system for systematically evaluating RL algorithms in dynamic real-world scenarios. We describe the key components of the Box o Flows, and through a series of experiments demonstrate how state-of-the-art model-free RL algorithms can synthesize a variety of complex behaviors via simple reward specifications. Furthermore, we explore the role of offline RL in data-efficient hypothesis testing by reusing past experiences. We believe that the insights gained from this preliminary study and the availability of systems like the Box o Flows support the way forward for developing systematic RL algorithms that can be generally applied to complex, dynamical systems. Supplementary material and videos of experiments are available at https://sites.google.com/view/box-o-flows/home.

실제 세계 유체 환경에서의 강체 제어를 위한 심층 강화 학습

Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning

초록

Support