物理信息粘性价值表示
Physics Informed Viscous Value Representations
February 26, 2026
作者: Hrishikesh Viswanath, Juanwu Lu, S. Talha Bukhari, Damon Conover, Ziran Wang, Aniket Bera
cs.AI
摘要
离线目标条件强化学习(GCRL)通过静态预收集数据集学习目标条件策略。然而,由于状态-动作空间的有限覆盖,精确的价值估计仍面临挑战。近期物理启发方法尝试通过在一阶偏微分方程(如程函方程)上定义正则化项,对价值函数施加物理和几何约束以解决该问题。但这些公式在复杂高维环境中往往不适定。本研究提出一种基于汉密尔顿-雅可比-贝尔曼(HJB)方程粘性解的物理启发正则化方法。通过引入物理驱动的归纳偏置,我们的方法将学习过程锚定在最优控制理论中,显式规范并限制价值迭代中的更新幅度。进一步地,我们利用费曼-卡茨定理将偏微分方程解重构为期望形式,实现了目标的蒙特卡洛估计,避免了高阶梯度中的数值不稳定问题。实验表明,该方法能有效提升几何一致性,可广泛应用于导航任务及高维复杂操作任务。开源代码详见:https://github.com/HrishikeshVish/phys-fk-value-GCRL。
English
Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.