物理信息粘性价值表征
Physics Informed Viscous Value Representations
February 26, 2026
作者: Hrishikesh Viswanath, Juanwu Lu, S. Talha Bukhari, Damon Conover, Ziran Wang, Aniket Bera
cs.AI
摘要
離線目標條件強化學習(GCRL)通過靜態預收集數據集學習目標條件策略。然而,由於狀態-動作空間的覆蓋範圍有限,精確的價值估計仍是挑戰。近期物理信息方法嘗試通過一階偏微分方程(如程函方程)正則化對價值函數施加物理和幾何約束,但這類公式在複雜高維環境中常出現不適定性。本文提出源自哈密頓-雅可比-貝爾曼(HJB)方程黏性解的物理信息正則化方法,通過物理學歸納偏置將學習過程錨定於最優控制理論,顯式正則化並限制價值迭代中的更新幅度。進一步基於費曼-卡茨定理將偏微分方程解重構為期望形式,實現可蒙特卡洛估計的目標函數,避免高階梯度數值不穩定性。實驗表明該方法能提升幾何一致性,可廣泛應用於導航任務及高維複雜操作任務。開原始碼見 https://github.com/HrishikeshVish/phys-fk-value-GCRL。
English
Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.