Fysisch Geïnformeerde Viskeuze Waarderepresentaties

Samenvatting

Offline doelgerichte reinforcement learning (GCRL) leert doelgerichte beleidsfuncties van statische, vooraf verzamelde datasets. Nauwkeurige waardeschatting blijft echter een uitdaging vanwege de beperkte dekking van de staat-actie-ruimte. Recente fysica-geïnformeerde benaderingen hebben getracht dit aan te pakken door fysieke en geometrische beperkingen op te leggen aan de waardefunctie via regularisatie gedefinieerd over partiële differentiaalvergelijkingen (PDV's) van de eerste orde, zoals de Eikonal-vergelijking. Deze formuleringen kunnen echter vaak slecht gesteld zijn in complexe, hoogdimensionale omgevingen. In dit werk stellen we een fysica-geïnformeerde regularisatie voor, afgeleid van de viscositeitsoplossing van de Hamilton-Jacobi-Bellman (HJB)-vergelijking. Door een op fysica gebaseerde inductieve bias te bieden, verankert onze aanpak het leerproces in de optimale regeltheorie en regulariseert en begrenst het updates tijdens waardeteraties expliciet. Verder maken we gebruik van de Feynman-Kac-stelling om de PDV-oplossing te herformuleren als een verwachtingswaarde, wat een hanteerbare Monte Carlo-schatting van het doel mogelijk maakt die numerieke instabiliteit in hogere-orde gradiënten vermijdt. Experimenten tonen aan dat onze methode de geometrische consistentie verbetert, waardoor deze breed toepasbaar is voor navigatie- en hoogdimensionale, complexe manipulatietaken. Open-source code is beschikbaar op https://github.com/HrishikeshVish/phys-fk-value-GCRL.

English

Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.

Fysisch Geïnformeerde Viskeuze Waarderepresentaties

Physics Informed Viscous Value Representations

Samenvatting

Support